Prognostic and diagnostic method for cancer therapy

ABSTRACT

The present invention relates to methods for the diagnosis and prognosis of aggressive forms of cancer using a combination of gene expression signatures.

RELATED APPLICATIONS

This application is a continuation of U.S. Ser. No. 11/732,442, filed Apr. 2, 2007, now issued as U.S. Pat. No. 7,890,267, which claims priority to U.S. Provisional Application Nos. 60/875,061, filed on Dec. 15, 2006; 60/823,577, filed on Aug. 25, 2006; 60/822,705, filed on Aug. 17, 2006; and 60/787,818, filed on Mar. 31, 2006, each of which is incorporated herein by reference in its entirety.

STATEMENT AS TO FEDERALLY SPONSORED RESEARCH

This invention was made using federal funds awarded by the National Institutes of Health, National Cancer Institute under contract number 1R01CA89827-01. The government has certain rights to this invention.

SEQUENCE LISTING

The instant application contains a sequence listing which has been submitted via EFS-Web and is hereby incorporated by reference in its entirety. The computer readable copy of the sequence listing in ASCII format was created on Apr. 13, 2011 and is entitled 26141_(—)507C01US_SeqList_ST25. The file is 855 bytes in size.

FIELD OF THE INVENTION

The invention relates to diagnostic and prognostic methods and kits for cancer therapy.

BACKGROUND

A wide variety of cancer treatment protocols have been developed in recent years. Often, very aggressive cancer therapy is reserved for late stage cancers due to unwanted side effects produced by such therapy. However, even such aggressive therapy commonly fails at such a late stage. The ability to identify cancers responsive only to the most aggressive therapies at an earlier stage could greatly improve the prognosis for patients having such cancers.

Only very recently, however, have markers predictive of such outcomes been identified. Glinsky, G. V. et al., J. Clin. Invest. 113: 913-923 (2004) teaches that gene expression profiling predicts clinical outcomes of prostate cancer. van 't Veer et al., Nature 415: 530-536 (2002) teaches that gene expression profiling predicts clinical outcomes of breast cancer. Glinsky et al., J. Clin. Invest. 115: 1503-1521 (2005) teaches that altered expression of the BMI1 oncogene is functionally linked with self-renewal state of normal and leukemic stem cells as well as a poor prognosis profile of an 11-gene death-from-cancer signature predicting therapy failure in patients with multiple types of cancer. These studies utilized the microarray gene expression analysis approach.

There is, therefore, a need for methods for early diagnosis of cancer and for prognostic assays for cancer therapy that are readily adaptable to the clinical setting. Such methods should utilize technologies that can be readily carried out in clinical laboratories, and should accurately predict the resistance of various cancers to be applied to standard therapeutic regimens.

SUMMARY OF THE INVENTION

The present invention is directed to novel methods and kits for diagnosing the presence of cancer within a patient, and for determining whether a subject who has cancer is susceptible to different types of treatment regimens. The cancers to be tested include, but are not limited to, prostate, breast, lung, gastric, ovarian, bladder, lymphoma, mesothelioma, medullablastoma, glioma, and AML.

One embodiment of the present invention is directed to a method for diagnosing cancer or predicting cancer-therapy outcome by detecting the expression levels of multiple markers in the same cell at the same time, and scoring their expression as being above a certain threshold, wherein the markers are from a particular pathway related to cancer, with the score being indicative or a cancer diagnosis or a prognosis for cancer-therapy failure. This method can be used to diagnose cancer or predict cancer-therapy outcomes for a variety of cancers. The simultaneous co-expression of at least two markers in the same cell from a subject is a diagnostic for cancer and a predictor for the subject to be resistant to standard cancer therapy. The markers can come from any pathway involved in the regulation of cancer, including specifically the PcG pathway and the “stemness” pathway. The markers can be mRNA, DNA, or protein.

These and other embodiments of the present invention rely at least in part upon the novel finding that the expression of multiple markers above a threshold level in the same cell at the same time, wherein the markers are found within pathways related to cancer, can be used as an assay to diagnose cancer disorders and to predict whether a patient already diagnosed with cancer will be therapy-responsive or therapy-resistant. An element of the assay is that two or more markers are detected simultaneously within the same cell. Marker detection can be made through a variety of detection means, including bar-coding through immunofluorescence. The markers detected can be a variety of products, including mRNA, DNA, and protein. For mRNA based markers, PCR can be used as a detection means. Additionally, protein products or gene copy number can be identified through detection means known in the art. The markers detected can be from a variety of pathways related to cancer. Suitable pathways for markers within the scope of the present invention include any pathways related to oncogenesis and metastasis, and more specifically include the Polycomb group (PcG) chromatin silencing pathway and the “stemness” pathway.

In another embodiment, the invention is directed to a method for diagnosing cancer or predicting cancer-therapy outcome in a subject, said method comprising the steps of:

a) obtaining a sample from the subject,

b) selecting a marker from a pathway related to cancer,

c) screening for a simultaneous aberrant expression level of two or more markers in the same cell from the sample, and

d) scoring their expression level as being aberrant when the expression level detected is above or below a certain detection threshold coefficient, wherein the detection threshold coefficient is determined by comparing the expression levels of the samples obtained from the subjects to values in a reference database of sample phenotypes obtained from subjects with either a known diagnosis or known clinical outcome after therapy, wherein the presence of an aberrant expression level of two or more markers in individual cells and presence of cells aberrantly expressing two or more such markers is indicative of a cancer diagnosis or a prognosis for cancer-therapy failure in the subject. The subset of markers to be used within the methods of the present invention include any markers associated with cancer pathways.

In preferred embodiments, the markers can be selected from the genes identified in FIGS. 11-20. The markers can comprise anywhere ranging from two markers listed within each table up to the whole set of genes listed within each of these tables. The markers can comprise any percentage of genes selected from each of these tables, including 90%, 80%, 70%, 60%, or 50% of the genes identified in FIGS. 11-20.

In this method, an aberrant co-expression level of the markers can be indicative of the presence of cancer in the subject, or predictive of cancer-therapy failure in the subject. The markers can be selected from any suitable cancer pathway, including in preferred embodiments markers from the Polycomb or “stemness” pathway. These markers can be genes selected from the group consisting of ADA, AMACR+p63, ANK3, BCL2L1, BIRC5, BMI-1, BUB1, CCNB1, CCND1, CES1, CHAF1A, CRIP1, CRYAB, ESM1, EZH2, FGFR2, FOS, Gbx2, HCFC1, IER3, ITPR1, JUNB, KLF6, KI67, KNTC2, MGC5466, Phc1, RNF2, Suz12, TCF2, TRAP100, USP22, Wnt5A and ZFP36. In preferred embodiments, the markers are selected from the group consisting of BMI1, Ezh2, H2A, H3, transcription factors, and methylation patterns. In one preferred embodiment, the aberrant co-expression level detected is of BMI1 and Ezh2, and in another preferred embodiment the aberrant co-expression level detected is of H2A and H3. The markers being detected are in the form of either mRNA, DNA, or protein.

In a preferred embodiment, the sample phenotypes are selected from the group consisting of cancer, non-cancer, recurrence, non-recurrence, relapse, non-relapse, invasiveness, non-invasiveness, metastatic, non-metastatic, localized, tumor size, tumor grade, Gleason score, survival prognosis, lymph node status, tumor stage, degree of differentiation, age, hormone receptor status, PSA level, histologic type, and disease free survival.

The aberrant expression level of two or more markers can be detected by any detection means known in the art, including, but not limited to, subjecting the cells to an analysis selected from the group consisting of multicolor quantitative immunofluorescence co-localization analysis, fluorescence in situ hybridization, and quantitative RT-PCR analysis.

In another embodiment, the present invention is directed to a method of determining a detection threshold coefficient for classifying a sample phenotype from a subject, the method comprising the steps of:

a) obtaining a sample from the subject,

b) selecting two or more markers from a pathway related to cancer,

c) screening for a simultaneous aberrant expression level of the two or more markers in the same cell from the sample;

d) scoring the marker expression in the cells by comparing the expression levels of the samples obtained from the subjects to values in a reference database of samples obtained from subjects with either a known diagnosis or known clinical outcome after therapy, and

e) determining the detection threshold coefficient for the sample classification accuracy at different detection thresholds using reference database of samples from subjects with known phenotypes.

Detection threshold coefficients which are indicative of a cancer diagnosis or a prognosis for cancer-therapy failure have an absolute value within the range of .gtoreq.0.5. to .gtoreq.0.999. Preferred levels of detection threshold coefficients which are indicative of a cancer diagnosis or a prognosis for cancer-therapy failure have an absolute value of .gtoreq.0.5, .gtoreq.0.6, .gtoreq.0.7, .gtoreq.0.8, .gtoreq.0.9, .gtoreq.0.95, .gtoreq.0.99, .gtoreq.0.995, and .gtoreq.0.999.

In another embodiment, the method further comprises determining the best performing magnitude of said detection threshold and using said magnitude to assess the reliability of said established detection threshold in classifying a sample phenotype. In another embodiment, the method further comprises using the best performing magnitude of said detection threshold to score an unclassified sample and assign a sample phenotype to said sample.

In another embodiment, the present invention is directed to a method for simultaneously detecting an aberrant co-expression level of two or more markers a single cell, said method comprising the steps of:

a) obtaining a sample of tissue,

b) selecting a marker defined by a pathway,

c) screening for a simultaneous aberrant expression level of the two or more markers, and

d) scoring their expression level as being aberrant when the expression level detected is above or below a certain detection threshold coefficient, wherein the detection threshold coefficient is determined by comparing the expression levels of the samples obtained from the subjects to values in a reference database of sample phenotypes obtained from subjects with either a known diagnosis or known clinical outcome after therapy.

The present invention is also directed to kits useful in detecting the simultaneous aberrant co-expression levels of two or more markers in a single cell.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows EZH2 siRNA-mediated changes of the transcript abundance levels of 11 genes comprising the BM-1-pathway signature.

FIG. 2 shows siRNA-mediated changes of the transcript abundance levels of 11 genes comprising BM-1-pathway signature. A. BM-1 siRNA. B. EZH2 siRNA.

FIG. 3 shows expression profiles of 11 gene MM-1-signature in distant metastatic lesions of the TRAMP transgenic mouse model of prostate cancer and PNS neurospheres.

FIG. 4 shows increased DNA copy numbers of the BM-1 and Ezh2 genes in human prostate carcinoma cells selected for high metastatic potential.

FIG. 5 shows the quadruplicon of prostate cancer progression in the LNCap progression model.

FIG. 6 shows the quadruplicon of prostate cancer progression in the PC-3 progression model.

FIG. 7 shows the quadruplicon of prostate cancer progression in the PC-3 bone metastasis progression model.

FIG. 8 shows that high expression levels of the BMI1 and Ezh2 oncoproteins in human prostate carcinoma metastasis precursor cells are associated with marked accumulation of a dual-positive high BMI1/Ezh2-expressing cell population and increased DNA copy number of the BMI1 and Ezh2 genes.

A quantitative reverse-transcription PCR (Q-RT-PCR) analysis of DNA copy numbers of the BMI1 and Ezh2 genes in multiple experimental models of human prostate cancer. Note marked increase of the BMI1 and Ezh2 gene copy numbers in highly metastatic variants compared to the low metastatic counterparts in the multiple independently selected lineages. The results of one of two independent experiments are shown.

FIG. 9 shows that targeted reduction of the BMI1 or Ezh2 expression increases sensitivity of human prostate carcinoma metastasis precursor cells to anoikis. Anoikis-resistant PC-3-32 prostate carcinoma cells were treated with BMI1- or Ezh2-targeting siRNAs and continuously monitored for expression levels of the various mRNAs, BMI and Ezh2 oncoproteins, as well as cell growth and viability under various culture conditions. PC-3-32 cells with reduced expression of either BMI1 or Ezh2 oncoproteins acquired sensitivity to anoikis as demonstrated by the loss of viability and increased apoptosis compared to the control LUC siRNA-treated cultures growing in detached conditions.

FIG. 10 shows that Increased BMI1 and Ezh2 expression is associated with high likelihood of therapy failure and disease relapse in prostate cancer patients after radical prostatectomy. Kaplan-Meier survival analysis demonstrates that cancer patients with more significant elevation of the BMI1 and Ezh2 expression [having higher tumor (T) to adjacent normal tissue (N) ratio, T/N: FIG. 10-1; or having tumors with higher levels of BMI1 or Ezh2 expression) are more likely to fail therapy and develop a disease recurrence after radical prostatectomy. FIG. 10-3 shows the Kaplan-Meier survival analysis of 79 prostate cancer patients stratified into five sub-groups using eight-covariate cancer therapy outcome (CTO) algorithm. CTO algorithm integrates individual prognostic powers of BMI1 and Ezh2 expression values and six clinico-pathological covariates (preoperative PSA, Gleason score, surgical margins, extra-capsular invasion, seminal vesicle invasion, and age).

FIG. 11 shows breast cancer CTOP signatures in Affymetrix format, with predictive outcomes.

FIG. 12 shows breast cancer CTOP signatures in Agilent Rosetta Chip format, with predictive outcomes.

FIG. 13 shows prostate cancer CTOP signatures in Affymetrix format, with predictive outcomes.

FIG. 14 shows the parent methylation Signatures.

FIG. 15 shows the histones H3 and H2A CTOP signatures.

FIG. 16 shows the CTOP gene expression signatures for prostate cancer.

FIG. 17 shows the CTOP gene expression signatures for breast cancer.

FIG. 18 shows the CTOP gene expression signature and survival data for lung cancer.

FIG. 19 shows the CTOP gene expression signature for ovarian cancer.

FIG. 20 shows the CTOP gene expression signatures for breast cancer.

FIG. 21 shows CTOP scores for lung cancer.

FIG. 22 shows Kaplan-Meier survival analysis of seventy-nine prostate cancer patients stratified into sub-groups with distinct expression profiles of the individual Polycomb pathway ESC signatures (top six panels) or six ESC signatures algorithm (bottom panel) in primary prostate tumors. In each individual signature panel, patients were sorted in descending order based on the values of the corresponding signature CTOP scores and divided into poor prognosis (top 50% scores) and good prognosis (bottom 50% scores) sub-groups. In the last panel, patients were sorted in descending order based on the values of the cumulative CTOP scores and divided into poor prognosis (top 50% scores) and good prognosis (bottom 50% scores) sub-groups. The cumulative CTOP scores represent the sum of the six individual CTOP scores calculated for each patient.

FIG. 23 shows Kaplan-Meier survival analysis of two-hundred eighty-six early-stage LN negative breast cancer patients stratified into sub-groups with distinct expression profiles of the individual Polycomb pathway ESC signatures (top six panels) or six ESC signatures algorithm (single middle panel) in primary breast tumors. Bottom four panels show patients' classification performance of the six ESC signatures algorithm in four different breast cancer therapy outcome data sets. Patients' stratification was performed using either individual CTOP scores (top six panels) or cumulative CTOP scores (bottom five panels) as described in the legend to the FIG. 22.

FIG. 24 shows bivalent chromatin domain-containing transcription factors (BCD-TF) manifest “stemness” expression profiles in therapy-resistant prostate and breast tumors.

1. Chromatin context identified by the presence of histones harboring specific modifications of the histone tails defines mutually exclusive transcriptionally active or silent states of corresponding genetic loci in genomes of most cells. In ESC multiple chromosomal regions were identified simultaneously harboring both “silent” (H3K27met3) and “active” (H3K4) histone marks and ˜100 transcription factor (TF) encoding genes are residing within these bivalent chromatin domain-containing chromosomal regions. Expression of selected TF encoding genes in ESC, including bivalent chromatin domain-containing TF genes (BCD-TF), maintenance of a “stemness” state, and transition to differentiated phenotypes is regulated by the balance of the “stemness” TFs (Nang, Sox2, Oct4) and Polycomb group (PcG) proteins bound to the promoters of target genes. 2. Thirteen-gene BCD-TF signature manifesting highly concordant (r=0.853; P<0.001) gene expression profiles in breast and prostate tumors from patients with therapy-resistant disease phenotypes. 3. Eight-gene BCD-TF signature (derived from thirteen-gene BCD-TF signatures) manifesting highly concordant expression profiles (r=0.716; p<0.001) in ESC and therapy-resistant breast and prostate tumors. Kaplan-Meier analysis demonstrates that prostate and breast cancer patients with tumors harboring ESC-like expression profiles of the eight-gene BCD-TF signature are more likely to fail therapy (bottom two panels). Gene expression profiles of clinical samples were independently generated for therapy-resistant breast and prostate tumors using multivariate Cox regression analysis of microarrays of tumor samples from 286 breast cancer and 79 prostate cancer patients with known log-term clinical outcome after therapy. Gene expression profiles of mouse ESC were derived by comparing microarray analyses of pluripotent self-renewing ESC (control ESC cultures treated with HP siRNA) versus ESC treated with Esrrb siRNA (day 6). At this time point, Esrrb siRNA-treated ESC do not manifest ‘stemness” phenotype and form colonies of differentiated cells.

FIG. 25 shows Kaplan-Meier survival analysis of two-hundred eighty-six early-stage LN negative breast cancer patients (top four panels) and seventy-nine prostate cancer patients (bottom four panels) stratified into sub-groups with distinct expression profiles of the individual CTOP signatures [bivalent chromatin domain transcription factors (BCD-TF) and ESC pattern 3 signatures], eight ESC signatures algorithm, and nine “stemness” signatures algorithm in primary breast or prostate tumors. Patients' stratification was performed using either individual CTOP scores (for individual signatures) or cumulative CTOP scores (for CTOP algorithms) as described in the legend to the FIG. 22.

FIG. 26 shows Kaplan-Meier survival analysis of seventy-nine prostate cancer patients (top four panels) and ninety-seven early-stage LN negative breast cancer patients (middle four panels) stratified into sub-groups with distinct expression profiles of the individual CTOP signatures [histones H3 and H2A signatures; Polycomb (PcG) pathway methylation signature] and two signatures PcG methylation/histones H3/H2A algorithm (bottom two panels) in primary prostate and breast tumors. Patients' stratification was performed using either individual CTOP scores (for individual signatures) or cumulative CTOP scores (for CTOP algorithm) as described in the legend to the FIG. 22.

FIG. 27 shows Kaplan-Meier survival analysis of two-hundred eighty-six early-stage LN negative breast cancer patients (top left panel), seventy-nine prostate cancer patients (top right panel), ninety-one early-stage lung cancer patients (bottom left panel), and one-hundred thirty-three ovarian cancer patients (bottom right panel) stratified into sub-groups with distinct expression profiles of the nine “stemness” signatures algorithm in primary breast, prostate, lung, and ovarian tumors. Patients' stratification was performed using cumulative CTOP scores of the nine “stemness’” signatures as described in the legend to the FIG. 22. Patients were sorted in descending order based on the values of the cumulative CTOP scores and divided into five sub-groups at 20% increment of the cumulative CTOP score values.

DETAILED DESCRIPTION

The present invention is directed to novel methods and kits for diagnosing the presence of cancer within a patient, and for determining whether a subject who has cancer is susceptible to different types of treatment regimens. The cancers to be tested include, but are not limited to, prostate, breast, lung, gastric, ovarian, bladder, lymphoma, mesothelioma, medullablastoma, glioma, mantle cell lymphoma, and AML.

In some embodiments, the kits and methods of the present invention can be used to predict various different types of clinical outcomes. For example, the invention can be used to predict recurrence of disease state after therapy, non-recurrence of a disease state after therapy, therapy failure, short interval to disease recurrence (e.g., less than two years, or less than one year, or less than six months), short interval to metastasis in cancer (e.g., less than two years, or less than one year, or less than six months), invasiveness, non-invasiveness, likelihood of metastasis in cancer, likelihood of distant metastasis in cancer, poor survival after therapy, death after therapy, disease free survival and so forth.

One embodiment of the present invention is directed to a method for diagnosing cancer or predicting cancer-therapy outcome by detecting the expression levels of multiple markers in the same cell at the same time, and scoring their expression as being above a certain threshold, wherein the markers are from a particular pathway related to cancer, with the score being indicative or a cancer diagnosis or a prognosis for cancer-therapy failure. This method can be used to diagnose cancer or predict cancer-therapy outcomes for a variety of cancers. The simultaneous co-expression of at least two markers in the same cell from a subject is a diagnostic for cancer and a predictor for the subject to be resistant to standard cancer therapy. The markers can come from any pathway involved in the regulation of cancer, including specifically the PcG pathway and the “stemness” pathway. The markers can be mRNA (messenger RNA), DNA, microRNA, or protein.

The subset of markers to be used within the methods of the present invention include any markers associated with cancer pathways. In preferred embodiments, the markers can be selected from the genes identified in FIGS. 11-20. The markers can comprise anywhere ranging from two markers listed within each table up to the whole set of genes listed within each of these tables. The markers can comprise any percentage of genes selected from each of these tables, including 90%, 80%, 70%, 60%, or 50% of the genes identified in each of FIGS. 11-20.

These and other embodiments of the present invention rely at least in part upon the novel finding that the expression of multiple markers above a threshold level in the same cell at the same time, wherein the markers are found within pathways related to cancer, can be used as an assay to diagnose cancer disorders and to predict whether a patient already diagnosed with cancer will be therapy-responsive or therapy-resistant. An element of the assay is that two or more markers are detected simultaneously within the same cell.

Marker detection can be made through a variety of detection means, including bar-coding through immunofluorescence. The markers detected can be a variety of products, including mRNA, DNA, microRNA, and protein. For mRNA or microRNA based markers, PCR can be used as detection means. Additionally, protein products, gene expression, or gene copy number can be identified through detection means known in the art.

Detection means, in case of a nucleic acid probe, include measuring the level of mRNA or cDNA to which a probe has been engineered to bind, where the probe binds the intended species and provides a distinguishable signal. In some embodiments, the probes are affixed to a solid support, such as a microarray. In other embodiments, the probes are primers for nucleic acid amplification of a set of genes. Q-RT-PCR amplification can be used. Detecting expression for measurement or determining protein expression levels can also be accomplished by using a specific binding reagent, such as an antibody. In general, expression levels of the markers can be analyzed by any method now known or later developed to assess gene expression, including but not limited to measurements relating to the biological processes of nucleic acid amplification, transcription, RNA splicing, and translation. Direct and indirect measures of gene copy number (e.g., as by fluorescence in situ hybridization or other type of quantitative hybridization measurement, or by quantitative PCR), transcript concentration (e.g., as by Northern blotting, expression array measurements, quantitative RT-PCR, or comparative genomic hybridization) and protein concentration (e.g., as by quantitative 2D gel electrophoresis, mass spectrometry, Western blotting, ELISA, or other method for determining protein concentration), can also be used.

One of skill in the art would recognize that different affinity reagents could be used with the present invention, such as one or more antibodies (monoclonal or polyclonal) and the invention can include using techniques, such as ELISA, for the analysis. Thus, specific antibodies (specific to the markers to be detected) can be used in a kit and in methods of the present invention. In a kit of the present invention, the kit would include reagents and instructions for use, where the reagents could be protein-specific differentially-labeled fluorescent antibodies; protein-specific antibodies from different species (mouse, rabbit, goat, chicken, etc.) and differentially labeled species-specific antibodies; DNA and RNA-based probes with different fluorescent dyes; bar-coded nucleic acid- and protein-specific probes (each probes having a unique combination of colors).

The markers detected can be from a variety of pathways related to cancer. Suitable pathways for markers within the scope of the present invention include any pathways related to oncogenesis and metastasis, and more specifically include the Polycomb group (PcG) chromatin silencing pathway and the “stemness” pathway.

Representative cancer pathways within the context of the present invention include but are not limited to, the Polycomb pathway, the Polycomb pathway target genes, “stemness” pathways, DNA methylation pathways, BMI1, Ezh2, Suz12, Suz12/PolII, EED, PcG-TF, BCD-TF, TEZ, Nanog/Sox2/Oct4, Myc, He2/neu, CCND1, E2F3, PI3K, beta-catenin, ras, src, PTEN, p53, Rb, p16/ARF, p21, Wnt, and Hh pathways.

The Polycomb group (PcG) gene BMI1 is required for the proliferation and self-renewal of normal and leukemic stem cells. Over-expression of Bmi1 oncogene causes neoplastic transformation of lymphocytes and plays an essential role in the pathogenesis of myeloid leukemia. Another PcG protein, Ezh2, has been implicated in metastatic prostate and breast cancers, suggesting that PcG pathway activation is relevant for epithelial malignancies. Here it is demonstrated that activation of the BMI1 oncogene-associated PcG pathway plays an essential role in metastatic prostate cancer, thus mechanistically linking the pathogenesis of leukemia, self-renewal of stem cells, and prostate cancer metastasis.

In another aspect, the methods of the present invention provide for the diagnosis, prognosis, and treatment strategy for a patient with a disorder of the above mentioned types. Treatment includes determining whether a patient has an expression pattern of markers associated with cancer and administering to the patient a therapeutic adapted to the treatment of the disorder. In one embodiment, the method can include the identification of increased BMI1 and Ezh2 expression and the formulation of a treatment plan specific to this phenotype.

In another embodiment of the present invention, the detection of appropriate or inappropriate activation of “stemness” genetic pathways can be used to diagnose cancer and to predict the likelihood of cancer therapy success or failure. Inappropriate activation of “stemness” genes in cancer cells may be associated with aggressive clinical behavior and increased likelihood of therapy failure. A sub-set of human prostate tumors represents a genetically distinct highly malignant sub-type of prostate carcinoma with high propensity toward metastatic dissemination even at the early stage of disease. Such a high propensity toward metastatic dissemination of this type of prostate tumors is associated with the early engagement of normal stem cells into malignant process. Elucidation of such inappropriate activation of “stemness” gene expression can help tailor cancer therapy to a patient's individual needs.

The invention is directed to prognostic assays for cancer therapy that can be used to diagnose cancer and to predict the resistance of various cancers to standard therapeutic regimens. The invention is directed to methods and compositions for predicting the outcome of cancer therapy for individual patients. In one embodiment, the method is used to predict whether a particular cancer patient will be therapy-responsive or therapy-resistant. The invention can be used with a variety of cancers, including but not limited to, breast, prostate, ovarian, lung, glioma, and lymphoma.

The invention is directed to personalized medicine for cancer patients, and encompasses the selection of treatment options with the highest likelihood of successful outcome for individual cancer patients. The present invention is directed to the use of a an assay to predict the outcome after therapy in patients with early stage cancer and provide additional information at the time of diagnosis with respect to likelihood of therapy failure.

In another embodiment of the present invention, the detection of the state of transcription factors can be used to diagnose the presence of cancer and to predict the likelihood of cancer therapy success or failure. The determination of a common pattern of the transcription factor expression can be used as a profile to help determine clinical outcome. The invention is also directed to a particular sub-set of BCD-TF genes defined here as the eight gene BCD-TF signature that manifests “stemness” expression profiles in therapy-resistant prostate and breast tumors (FIG. 24).

In another embodiment of the present invention, the detection of the methylation state of target genes can be used to diagnose cancer and to predict the likelihood of cancer therapy success or failure. More particularly, PcG target genes with promoters frequently hypermethylated in cancer manifest distinct expression profiles associated with therapy-resistant and therapy-sensitive prostate and breast cancers (FIG. 25), implying that differences in gene expression between tumors with distinct outcome after therapy may be driven, in part, by the distinct promoter hypermethylation patterns of the PcG target genes. These differences can be exploited to generate highly informative gene expression signatures of the PcG target genes hypermethylated in cancer for stratification of prostate and breast cancer patients into sub-groups with statistically distinct likelihood of therapy failure (FIG. 25).

The invention involves both a method to classify patients into sub-groups predicted to be either therapy-responsive or therapy-resistant, and a method for determining alternate therapies for patients who are classified as resistant to standard cancer therapies. The method of the present invention is based on an accurate classification of patients into subgroups with poor and good prognosis reflecting a different probability of disease recurrence and survival after standard therapy.

In one embodiment, the invention relates to a method for diagnosing cancer or predicting cancer-therapy outcome in a subject, said method comprising the steps of:

a) obtaining a sample from the subject,

b) selecting a marker from a pathway related to cancer,

c) screening for a simultaneous aberrant expression level of two or more markers in the same cell from the sample, and

d) scoring their expression level as being aberrant when the expression level detected is above or below a certain detection threshold coefficient, wherein the detection threshold coefficient is determined by comparing the expression levels of the samples obtained from the subjects to values in a reference database of samples obtained from subjects with either a known diagnosis or known clinical outcome after therapy, wherein the presence of an aberrant expression level of two or more markers in individual cells and presence of cells aberrantly expressing two or more such markers is indicative of a cancer diagnosis or a prognosis for cancer-therapy failure in the subject.

An aberrant expression level is a level of expression that can either be higher or lower than the expression level as compared to reference samples. The reference samples can have a variety of phenotypes, including both diseased phenotypes and non-diseased phenotypes. The sample phenotypes within the scope of the present invention include, but are not limited to, cancer, non-cancer, recurrence, non-recurrence, relapse, non-relapse, invasiveness, non-invasiveness, metastatic, non-metastatic, localized, tumor size, tumor grade, Gleason score, survival prognosis, lymph node status, tumor stage, degree of differentiation, age, hormone receptor status, PSA level, histologic type, and disease free survival.

A detection threshold coefficient within the context of the present invention is a value above which or below which a patient or sample can be classified as either being indicative of a cancer diagnosis or a prognosis for cancer-therapy failure. The detection threshold coefficients are defined by a plurality of measurements of samples in the reference database; sorting the samples in descending order of the values of measurements; assignment of the probability of samples having a phenotype in sub-groups of samples defined at different increments of the values of measurements (e.g., samples comprising top 10%; 20%; 30%; 40%; 50%; 60%; 70%; 80%; 90% of the values); selecting the statistically best-performing detection threshold coefficient defined as the value of measurements segregating samples with the values below and above the threshold into subgroups with statistically distinct probability of having a phenotype (cancer vs non-cancer; therapy failure vs cure; etc.), ideally, segregating patients into subgroups with 100% probability of therapy failure and with 100% probability of a cure or as close to this probability values as practically possible.

This value of markers measurements is defined as the best performing magnitude of the detection threshold. The samples of unknown phenotype are then placed into corresponding subgroups based on the values of markers measurements and assigned the corresponding probability of having a phenotype. To determine these measurements, one skilled in the art can utilize different statistical programs and approaches such as the univariate and multivariate Cox regression analysis and Kaplan-Meier survival analysis.

Detection threshold coefficients which are indicative of a cancer diagnosis or a prognosis for cancer-therapy failure have an absolute value within the range of .gtoreq.0.5. to .gtoreq.0.999. Preferred levels of detection threshold coefficients which are indicative of a cancer diagnosis or a prognosis for cancer-therapy failure have an absolute value of .gtoreq.0.5, .gtoreq.0.6, .gtoreq.0.7, .gtoreq.0.8, .gtoreq.0.9, .gtoreq.0.95, .gtoreq.0.99, .gtoreq.0.995, and .gtoreq.0.999.

The present invention is also directed to a method of determining detection threshold coefficients for classifying a sample phenotype from a subject. This method comprises the steps of selecting two or more markers from a pathway related to cancer, screening for a simultaneous aberrant expression level of the two or more markers in the same cell from the sample and scoring the marker expression in the cells by comparing the expression levels of the samples obtained from the subjects to values in a reference database of samples obtained from subjects with either a known diagnosis or known clinical outcome after therapy, and determining the sample classification accuracy at different detection thresholds using reference database of samples from subjects with known phenotypes.

In another embodiment, the method of determining detection threshold coefficients for classifying a sample phenotype from a subject further comprises the additional step of determining the best performing magnitude of said detection threshold and using said magnitude to assess the reliability of said established detection threshold in classifying a sample phenotype.

Selection of the statistically best-performing detection threshold coefficient is defined as the value of measurements of the segregating samples with the values below and above the threshold, which are then split into subgroups with a statistically distinct probability of having a phenotype (cancer vs non-cancer; therapy failure vs cure, etc.). More preferably, patients or samples can be segregated into subgroups with 100% probability of therapy failure and with 100% probability of a cure, or as close to this probability values as practically possible. This value of markers measurements is defined as the best performing magnitude of the detection threshold. Additionally, the best performing magnitude of the detection threshold coefficient can be used to score an unclassified sample and assign a sample phenotype to said sample.

The present invention is also directed to a kit to detect the presence of two or more markers from a pathway related to cancer. The kit can contain as detection means protein-specific differentially-labeled fluorescent antibodies; protein-specific antibodies from different species (mouse, rabbit, goat, chicken, etc.) and differentially labeled species-specific antibodies; DNA and RNA-based probes with different fluorescent dyes; bar-coded nucleic acid- and protein-specific probes (each probes having a unique combination of colors), and any other detection means known in the art. The kit can include a marker sample collection means and a means for determining whether the sample expresses in the same cell at the same time two or more markers from a pathway related to cancer. Optionally, the kit contains a standard and/or an algorithmic device for assessing the results and additional reagents and components including for example DNA amplification reagents, DNA polymerase, nucleic acid amplification reagents, restrictive enzymes, buffers, a nucleic acid sampling device, DNA purification device, deoxynucleotides, oligonucleotides (e.g. probes and primers) etc.

The following non-standard abbreviations are used herein: DFI, disease-free interval; FBS, fetal bovine serum; MSKCC, Memorial Sloan-Kettering Cancer Center; NPEC, normal prostate epithelial cells; PC, prostate cancer; PSA, prostate specific antigen; Q-RT-PCR, quantitative reverse-transcription polymerase chain reaction; RP, radical prostatectomy; SKCC, Sidney Kimmel Cancer Center; AMACR, alpha-methylacyl-coenzyme A racemase; Ezh2, enhancer of zeste homolog 2; FACS, fluorescence activated cell sorting.

Human Genome Haplotype Map Leads to Identification of Relevant Markers

The recent completion of the initial phase of a haplotype map of the human genome provides an opportunity for integrative analysis on a genome-wide scale of microarray-based gene expression profiling and SNP variation patterns for discovery of cancer-causing genes and genetic markers of therapy outcome. Here the approach is used for analysis of SNPs of cancer-associated genes, expression profiles of which predict the likelihood of treatment failure and death after therapy in patients diagnosed with multiple types of cancer. Unexpectedly, the analysis reveals a common SNP pattern for a majority (60 of 74; 81%) of analyzed cancer treatment outcome predictor (CTOP) genes.

The analysis suggests that heritable germ-line genetic variations driven by a geographically localized form of natural selection determining population differentiations may have a significant impact on cancer treatment outcome by influencing the individual's gene expression profile. A CTOP algorithm can be built which combines the prognostic power of multiple gene expression-based CTOP models. Application of a CTOP algorithm to large databases of early-stage breast and prostate tumors identifies cancer patients with 100% probability of a cure with existing cancer therapies as well as patients with nearly 100% likelihood of treatment failure, thus providing a clinically feasible framework essential for the introduction of rational evidence-based individualized therapy selection and prescription protocols.

Relevant Genes for Cancer Diagnosis and Treatment Prediction

Genes considered to be in an “elite” group for use in predicting clinically relevant models are included in Table 1 below. These were generated by an analysis of the extensive genome-wide database of SNPs generated after the completion of the initial phase of the international HapMap project The initial effort was focused on 1) an analysis of the BMI1 oncogene, altered expression of which was functionally linked with the self-renewal state of normal and leukemic stem cells, and 2) a poor prognosis profile of an 11-gene death-from-cancer signature predicting therapy failure in patients with multiple types of cancer. A prominent feature of the BMI1-associated SNP pattern is YRI population-specific profiles of genotype and allele frequencies of multiple SNPs. Intriguingly, similar population-specific SNP profiles are readily discernable for most of loci comprising the 11-gene CTOP signature. Furthermore, this common SNP pattern is apparent for a majority of genetic loci expression profiles of which are predictive of therapy failure in prostate cancer patients after prostatectomy. Finally, 86% of genetic loci comprising a proteomics-based 50-gene CTOP signature predicting therapy outcome in patients diagnosed with multiple types of cancer show population differentiation profiles of SNPs.

Based on this analysis it is concluded that CTOP genes manifest a common feature of SNP patterns reflected in population-specific profiles of SNP genotype and allele frequencies. A majority of population-specific SNPs associated with CTOP genes represented by YRI population-differentiation SNPs, perhaps, reflecting a general trend of higher level of low-frequency alleles in the YRI population compared to CEU, CHB, and JPT populations due to bottlenecks in history of non-YRI populations. During the survey of the population-specific SNPs associated with CTOP genes, five non-synonymous coding SNPs were identified that represented good candidates for follow-up functional studies.

Oncogenes and tumor suppressor genes manifest population-specific profiles of SNP genotype and allele frequencies. Interestingly, in addition to CTOP genes, population-specific SNP patterns are readily discernable for genes with well-established causal role in cancer as oncogenes or tumor suppressor genes, implying that the genes are targets for geographically localized form of natural selection. Taken together, the data suggests the presence of population differentiation-associated cancer-related patterns of SNPs spanning across multiple chromosomal loci and, perhaps, forming a genome-scale cancer haplotype pattern. The data suggest that a block-like structure and low haplotype diversity leading to substantial correlations of SNPs with many of their neighbors may span beyond small chromosomal regions and these “haplotype principles” may be extended to include multiple chromosomal loci, perhaps, on a genome-wide scale. Of note, gene expression signatures associated with deregulation of corresponding oncogenic pathways for most genes provide clinically relevant CTOP models.

Genes considered to be in an “elite” group for use in predicting clinically relevant CTOP models are included in Table 1 below.

TABLE 1 Elite set of genes and availability of antibodies for detection of corresponding protein products selected for development of diagnostic and prognostic applications. Gene name UniGene Company Host Signature ADA Hs.407135 Santa Cruz Biotechnology, Inc. rabbit IgG M AMACR + p63 Abcam mouse IgG2a PC marker ANK3 Hs.440478 Santa Cruz Biotechnology, Inc. mouse monoclonal DFC IgG1 BCL2L1 Hs.305890 Santa Cruz Biotechnology, Inc. rabbit IgG M BIRC5 Hs.1578 Santa Cruz Biotechnology, Inc. mouse IgG2a DFC BMI-1 NM_005180 Upstate mouse monoclonal DFC IgG1 BMI-1 NM_005180 Santa Cruz Biotechnology, Inc. rabbit polyclonal IgG DFC BUB1 Hs.287472 Chemicon mouse monoclonal DFC CCNB1 Hs.23960 Santa Cruz Biotechnology, Inc. mouse monoclonal DFC IgG1 CCND1 Hs.523852 Santa Cruz Biotechnology, Inc. rabbit IgG DFC CES1 Hs. 499222 Santa Cruz Biotechnology, Inc. goat polyclonal DFC CHAF1A Hs.79018 Santa Cruz Biotechnology, Inc. rabbit IgG polyclonal R CRIP1 Hs.70327 BD Biosciences Pharmingen mouse monoclonal M CRYAB Hs.408767 Santa Cruz Biotechnology, Inc. rabbit IgG M ESM1 Hs.410668 Santa Cruz Biotechnology, Inc. goat IgG M EZH2 Hs.444082 Upstate rabbit polyclonal DFC FGFR2 Hs.404081 Santa Cruz Biotechnology, Inc. mouse IgG2b DFC FOS Hs.25647 Calbiochem rabbit polyclonal R Gbx2 Hs.184945 Chemicon rabbit polyclonal DFC HCFC1 Hs.83634 Santa Cruz Biotechnology, Inc. goat polyclonal IgG DFC IER3 Hs.76095 Santa Cruz Biotechnology, Inc. goat IgG polyclonal R ITPR1 Hs.149900 Abcam rabbit polyclonal R JUNB Hs.25292 Santa Cruz Biotechnology, Inc. rabbit IgG R KLF6 Hs.285313 Santa Cruz Biotechnology, Inc. rabbit IgG R KI67 Hs.80976 Santa Cruz Biotechnology, Inc. mouse monoclonal DFC IgG1 KNTC2 Hs.414407 BD Biosciences Pharmingen mouse monoclonal DFC IgG1 MGC5466 Hs.370367 Under development R RNF2 Hs.124186 Under development DFC Suz12 Hs.462732 Abcam rabbit polyclonal IgG DFC TCF2 Hs.408093 Santa Cruz Biotechnology, Inc. goat polyclonal R TRAP100 Hs.23106 Santa Cruz Biotechnology, Inc. goat IgG polyclonal M USP22 Hs.462492 Under development DFC Wnt5A Hs.152213 Santa Cruz Biotechnology, Inc. goat polyclonal R ZFP36 Hs.343586 Santa Cruz Biotechnology, Inc. rabbit polyclonal R Legend: PC, prostate carcinoma; M, metastasis signature; R, recurrence signature; DFC, death-from-cancer signature. Differential expression of genes listed in the table was confirmed by the Q-RT-PCR method using LCM dissected samples of malignant and adjacent normal tissues from prostate tumor samples.

SNP-based gene expression signatures predict therapy outcome in prostate and breast cancer patients. Our analysis demonstrates that CTOP genes are distinguished by a common population specific SNP pattern and potential utility as molecular predictors of cancer treatment outcome based on distinct profiles of mRNA expression. All gene expression models designed to predict cancer therapy outcome were developed using phenotype-based signature discovery protocols, e.g., genetic loci comprising the predictive models were selected based on association of their expression profiles with clinically relevant phenotype of interest. One of the implications of our analysis is that heritable genetic variations driven by geographically localized form of natural selection determining population differentiations may have a significant impact on cancer treatment outcome by influencing the individual's gene expression profile. One of the predictions of this hypothesis is that genes, expression levels of which are known to be regulated by SNP variations, may provide good candidates for building gene expression-based CTOP models.

Consistent with this idea, we found that loci with genetically determined differences in mRNA expression levels among normal individuals (demonstrated by linkage analysis and by allelic associations of gene expression changes with SNP variations) generate statistically significant therapy outcome prediction models for breast and prostate cancer patients.

A hallmark feature of common SNP pattern of CTOP genes is population-specific profiles of SNP allele and genotype frequencies. Most CTOP genes have multiple SNPs with population-specific genotype and allele frequencies, suggesting that CTOP genes may be targets for geographically localized form of natural selection contributing to population differentiation. Consistent with this hypothesis, expression signatures of genes containing high-differentiation non-synonymous SNPs provide CTOP models for prostate and breast cancers. Similarly, expression signatures of genes representing loci in which natural selection most likely occurred appear highly informative in predicting therapy outcome in breast and prostate cancer patients. To further test the validity of this concept, we successfully used a common SNP pattern of CTOP genes to define novel gene expression models of cancer therapy outcome prediction without any input of mRNA expression data in the initial gene screening and selection process. Conversely, expression profiles of cancer-related genes with established SNP-based associations with incidence and severity of disease manifest therapy outcome prediction power (CYP3A4 for prostate cancer and SULT1A1 for breast cancer). Important end-point of this analysis with potential mechanistic implications is that patients with low expression levels of genes regulating catabolism of androgens (CYP3A4; prostate cancer), estrogens (SULT1A1; breast cancer) and thyroid hormones (DIO3; breast cancer) have significantly increased likelihood of therapy failure.

Microarray analysis identifies clinically relevant cooperating oncogenic pathways associated with cancer therapy outcome. Bild et al., Nature 439: 353-357 (2006) provides compelling evidence of the power of microarray gene expression analysis in identifying multiple clinically relevant oncogenic pathways activated in human cancers. It provides mechanistic explanation to mounting experimental data demonstrating that there are multiple gene expression signatures predicting cancer therapy outcome in a given set of patients diagnosed with a particular type of cancer: presence of multiple CTOP models is most likely reflect deregulation of multiple oncogenic pathways, perhaps, cooperating in development of an oncogenic state.

We tested this hypothesis by comparing the cancer therapy outcome prediction power of three gene expression signatures derived from corresponding transgenic mouse models associated with activation of oncogenic pathways driven by BMI1, Myc, and Her2/neu oncogenes during the prostate and mammary carcinogenesis. To evaluate the prognostic power of the BMI1-, Myc-, and Her2/neu-pathway signatures, we made use of two previously published gene expression datasets for prostate and breast cancers (Glinsky, G. V. et al., J. Clin. Invest. 113: 913-923 (2004); van 't Veer et al., Nature 415: 530-536 (2002)). Applications of three signatures clearly outperform individual signatures in patients' stratification into statistically distinct sub-groups based on likelihood of therapy failure. All cancer patients with evidence of activation of three pathways (3 poor prognosis signatures) failed therapy, whereas patients with no evidence of even single pathway activation remained disease-free.

These data suggest that in a sub-group of prostate and breast cancer patients with therapy-resistant disease phenotype concomitant activation of pathways driven by BMI1, Myc, and Her2/neu oncogenes may contribute to development of highly malignant clinically lethal oncogenic state. Taken together with data presented by Bild et al., supra, these results provides strong rationale for translational application of microarray analysis in assisting physicians and patients during rational evidence-based selection of individualized target-tailored cancer therapies with highest probability of cancer cure.

We tested a potential translational utility of this genome-wide approach to SNP analysis and gene expression profiling by building and retrospectively validating a CTOP algorithm integrating therapy outcome prediction calls of multiple phenotype-based and SNP-based molecular signatures of cancer treatment outcome. This CTOP algorithm seems highly promising for identification at diagnosis prostate and breast cancer patients with 100% probability of a cure with existing therapy. It also allows selection of patients who would most likely benefit from more aggressive adjuvant systemic treatment protocols currently prescribed for patients with advanced metastatic cancers or disease relapse. If confirmed in prospective clinical validation studies, this approach should enable the practical implementation of a concept of individualized target-tailored cancer therapies allowing for rational evidence-based justification of prescription of such therapies for selected genetically defined group of patients at diagnosis. Finally, our analysis provides a strong rationale for development of genetic prognostic tests for prediction of cancer therapy outcome based on SNP analysis and expression profiling of individuals' normal cells such as blood cells.

In the human genome geographically localized form of natural selection causing population differentiation is reflected in population-specific signatures of a genome-wide SNP selection. Population differentiation is a generally accepted as a clue to past selection in one of the populations and 926 SNPs of this class have been described in the recent release of the HapMap project. Population-specific profiles of individual allele frequencies of the SNPs associated with CTOP genes suggest that cancer therapy outcome predictor genes can be found among genes carrying SNP-signatures of a genome-wide geographically localized form of natural selection causing population differentiation. Using these principles, we identified genes with SNP pattern similar to known CTO predictor genes among genetic loci with population differentiation SNP variants. Importantly, mRNA expression profiles of these genes generate statistically significant gene expression models of cancer therapy outcome prediction. These models were built without any input of mRNA expression data in the initial gene screening and selection process.

Analysis of a haplotype map of human genome indicates that vast majority of heterozygous sites in each person DNA will be explained by a limited set of common SNPs now contained (or captured through linkage disequilibrium, LD) in existing databases. Therefore, it is reasonable to assume that individual subjects within a population will likely carry unique combinations of population-differentiation SNPs identified in this study (or SNPs in LD with identified SNPs). We postulate that distinct patterns of population-differentiation SNPs associated with cancer-causing, cancer-associated, and CTOP genes would constitute important germ-line determinants of susceptibility, incidence, and severity of disease. Our analysis suggests that one of the main mechanisms of translation the SNP pattern diversity in disease phenotypes would be heritable SNP-driven variations in gene expression levels. Our analysis adds further support to recent data that SNP-driven effects on gene expression are seemingly spreading outside the boundaries of individual chromosomes and, perhaps, reaching a genome-wide scale. A majority of SNPs identified in this study is represented by intronic SNPs, suggesting that intronic SNPs may influence gene expression by yet unknown mechanism. Theoretically, intronic SNPs may influence gene expression by affecting a variety of processes such as chromatin silencing and remodeling, alternative splicing, transcription of microRNA genes, processivity of RNA polymerase, etc. Most likely mechanism of action would entail effect on stability and affinity of interactions between DNA molecule and corresponding multi-subunit complexes. Comparative genomics analysis has shown that about 5% of the human sequence is highly conserved across species, yet less than half of this sequence spans known functional elements such as exons. It is assumed that conserved non-genic sequences lack diversity because of selective constraint due to purifying selection; alternatively, such regions may be located in cold-spots for mutations. Most recent evidence shows that conserved non-genic sequences are not mutational cold-spots, and thus represent high interest for functional study. It would be of interest to determine whether population differentiation intronic SNPs overlap with such highly evolutionary conserved non-genic sequences.

Our analysis provides a possible clue with regard to mechanisms of genesis and evolution of disease-causing loci and translation of SNP variations in disease phenotypes. Geographically localized form of natural selection drives evolution of population differentiation SNP profiles which is translated in phenotypic diversity by determining individual gene expression variations. Until recently, this selection-driven evolution in human population was occurring within relatively restricted genetic pools due to travel and migration limitations in the demographic context of close alignment of populations' reproductive longevity and overall lifespan. During last century rapid and dramatic socio-economic and demographic changes (explosion in travel and migration; increasing length of individual's reproductive period; widening gap between reproductive longevity and life expectancy associated with a marked extension of continuous in vivo exposure of proliferating tissues to low levels of steroid hormones) altered the dynamic of these relationships in human population enhancing probability of emerging disease-enabling combinations of SNP profiles.

Markers from Polycomb Group (PcG) Pathway

Preferred markers within the context of the present invention include the double positive BMI1/Ezh2 from the PcG pathway. The Polycomb group (PcG) gene BMI1 is required for the proliferation and self-renewal of normal and leukemic stem cells. Over-expression of Bmi1 oncogene causes neoplastic transformation of lymphocytes and plays essential role in pathogenesis of myeloid leukemia. Another PcG protein, Ezh2, was implicated in metastatic prostate and breast cancers, suggesting that PcG pathway activation is relevant for epithelial malignancies. Whether an oncogenic role of the BMI1 and PcG pathway activation may be extended beyond the leukemia and may affect progression of solid tumors has previously remained unknown. Here it is demonstrated that activation of the BMI1 oncogene-associated PcG pathway plays an essential role in metastatic prostate cancer, thus mechanistically linking the pathogenesis of leukemia, self-renewal of stem cells, and prostate cancer metastasis.

To characterize the functional status of the PcG pathway in metastatic prostate cancer, advanced cell- and whole animal-imaging technologies, gene and protein expression profiling, stable siRNA-gene targeting, and tissue microarray (TMA) analysis in relevant experimental and clinical settings were utilized.

It was also demonstrated that in multiple experimental models of metastatic prostate cancer both BMI1 and Ezh2 genes are amplified and gene amplification is associated with increased expression of corresponding mRNAs and proteins. Images of human prostate carcinoma metastasis precursor cells isolated from blood were provided and shown to over-express both BMI1 and Ezh2 oncoproteins. Consistent with the PcG pathway activation hypothesis, increased BMI1 and Ezh2 expression in metastatic cancer cells is associated with elevated levels of H2AubiK119 and H3metK27 histones.

Quantitative immunofluorescence co-localization analysis and expression profiling experiments documented increased BMI1 and Ezh2 expression in clinical prostate carcinoma samples and demonstrated that high levels of BMI1 and Ezh2 expression are associated with markedly increased likelihood of therapy failure and disease relapse after radical prostatectomy. Gene-silencing analysis reveals that activation of the PcG pathway is mechanistically linked with highly malignant behavior of human prostate carcinoma cells and is essential for in vivo growth and metastasis of human prostate cancer. It is concluded that the results of experimental and clinical analyses indicate the important biological role of the PcG pathway activation in metastatic prostate cancer. It is suggested that the PcG pathway activation is a common oncogenic event in pathogenesis of metastatic solid tumors and provides the basis for development of small molecule inhibitors of the PcG chromatin silencing pathway as a novel therapeutic modality for treatment of metastatic prostate cancer.

Activation of PcG Protein Chromatin Silencing Pathway in Human Prostate Carcinoma Metastasis Precursor Cells.

The PcG pathway activation hypothesis implies that individual cells with activated chromatin silencing pathway would exhibit a concomitant nuclear expression of both BMI1 and Ezh2 proteins. Furthermore, cells with activated PcG pathway would manifest the increased expression levels of protein substrates targeted by the activation of corresponding enzymes to catalyze the H2A-K119 ubiquitination (BMI1-containing PRC1 complex) and H3-K27 methylation (Ezh2-containing PRC2 complex). Observations that increased BMI1 expression is associated with metastatic prostate cancer suggest that the PcG pathway might be activated in metastatic human prostate carcinoma cells. Consistent with this idea, previous independent studies documented an association of the increased Ezh2 expression with metastatic disease in prostate cancer patients. Therefore, immunofluorescence analysis was applied to measure the expression of protein markers of the PcG pathway activation in prostate cancer metastasis precursor cells isolated from blood of nude mice bearing orthotopic human prostate carcinoma xenografts.

Immunofluorescence analysis reveals that expression of all four individual protein markers of PcG pathway activation is elevated in blood-borne human prostate carcinoma metastasis precursor cells compared to the parental cells comprising a bulk of primary tumors. In order to document the PcG pathway activation in individual cells, the quantitative immunofluorescence co-localization analysis allowing for a simultaneous detection and quantification of several markers in a single cell was carried out. The quantitative immunofluorescence co-localization analysis demonstrates a marked enrichment of the population of blood-borne human prostate carcinoma metastasis precursor cells with the dual positive high BMI1/Ezh2-expressing cells.

These results were confirmed using two different mouse/rabbit primary antibody combinations for BMI1 and Ezh2 protein detection as well as different secondary fluorescent antibodies. Similar enrichment for the PcG pathway activated cells in a pool of circulating metastasis precursor cells is evident for other two-marker combination panels as well. In contrast to the protein markers of the PcG pathway activation, a significantly smaller fraction of cells expressing concomitantly high levels of the cytoplasmic AMACR/nuclear p63 proteins was detected in human prostate carcinoma metastasis precursor cells compared to the parental cell population. Therefore, the results of a quantitative immunofluorescence co-localization analysis indicate that measurements of several two-marker combinations demonstrate a significant enrichment of the population of prostate carcinoma metastasis precursor cells with the cells expressing high levels of the PcG pathway activation markers. Increased BMI1 and Ezh2 mRNA expression is associated with metastatic prostate cancer. Taken together these data support the hypothesis that PcG chromatin silencing pathway is activated in blood-borne human prostate carcinoma metastasis precursor cells and might contribute to the ability of metastatic cancer cells to survive and grow at distant sites.

Amplification of the BMI1 and Ezh2 Genes in Multiple Experimental Models of Human Prostate Cancer.

Increased expression of oncogenes is often associated with gene amplification. In agreement with proposed oncogenic role of the BMI1 and Ezh2 over-expression in human prostate carcinoma cells, it was documented that a significant amplification of both BMI1 and Ezh2 genes in human prostate carcinoma cell lines representing multiple experimental models of metastatic prostate cancer (FIG. 8). Notably, the level of gene amplification as determined by the measurement of DNA copy number for both BMI1 and Ezh2 genes is higher in metastatic cancer cell variants compared to the non-metastatic or less malignant counterparts, suggesting that gene amplification may play a casual role in elevation of the BMI1 and Ezh2 oncoprotein expression levels and high BMI1/Ezh2-expressing cells may acquire a competitive survival advantage during tumor progression.

PcG Pathway Activation Renders Circulating Human Prostate Carcinoma Metastasis Precursor Cells Resistant to Anoikis.

To ascertain the biological role of the PcG pathway activation in prostate cancer metastasis, human prostate carcinoma metastasis precursor cells were isolated from the blood of nude mice bearing orthotopic human prostate carcinoma xenografts, transfected with BMI1, Ezh2, or control siRNAs, and continuously monitored for mRNA and protein expression levels of BMI1, Ezh2, and a set of additional genes and protein markers using immunofluorescence analysis, RT-PCR, and Q-RT-PCR methods. Q-RT-PCR and RT-PCR analyses showed that siRNA-mediated BMI1-silencing caused ˜90% inhibition of the endogenous BMI1 mRNA expression. The effect of siRNA-mediated BMI1 silencing was validated at the protein expression level using immunofluorescence analysis (FIG. 9). The BMI1 silencing was specific since the expression levels of nine un-related transcripts were not altered (FIG. 9). Consistent with the hypothesis that expression of genes comprising the 11-gene death-from-cancer signature is associated with the expression of the BMI1 gene product, mRNA abundance levels of 8 of 11 interrogated BMI1-pathway target genes were altered in the human prostate carcinoma cells with siRNA-silenced BMI1 gene. For biological analysis we adopted the silencing protocol resulting in 80-100% reduction of the level of dual-positive BMI1/Ezh2 high-expressing metastasis precursor cells, thus yielding the cell population more closely resembling non-treated parental cells and markedly distinct from metastasis precursor cells treated with control siRNA (FIG. 9).

Reduction of the BMI1 mRNA and protein expression in human prostate carcinoma metastasis precursor cells did not alter significantly the viability of adherent cultures grown at the optimal growth condition and in serum starvation experiments. siRNA treatment had only modest inhibitory effect on proliferation causing ˜25% reduction in the number of cells. However, the ability of human prostate carcinoma cells to survive in non-adherent state was severely affected after siRNA-mediated reduction of the BMI1 expression (FIG. 9-1). FACS analysis revealed ˜3-fold increase of apoptosis in the BMI1 siRNA-treated human prostate carcinoma cells cultured in non-adherent conditions (FIG. 9-2). These data suggest that human prostate carcinoma cells expressing high level of the BMI1 protein are more resistance to apoptosis induced in cells of epithelial origin in response to attachment deprivation (anoikis). It is likely that these anoikis-resistant cancer cells would survive better in blood or lymph during metastatic dissemination thus forming a pool of circulatory stress-surviving metastasis precursor cells. Similar results were obtained when Ezh2 silencing experiments were performed (FIG. 9-3), suggesting that targeting of either PRC1 or PRC2 complexes is sufficient for interference with the PcG pathway activity and inhibition of anoikis-resistance mechanisms in metastatic prostate carcinoma cells.

Targeted Depletion of Human Prostate Carcinoma Cells with Activated PcG Pathway Creates Population of Cancer Cells with Dramatically Diminished Malignant Potential In Vivo.

Results of the experiments demonstrate that a population of highly metastatic prostate carcinoma cells is markedly enriched for cancer cells expressing increased levels of multiple markers of the PcG pathway activation. These data suggest that carcinoma cells with activated PcG pathway may manifest a highly malignant behavior in vivo characteristic of cancer cell variants selected for increased metastatic potential. To test this hypothesis, blood-borne human prostate carcinoma metastasis precursor cells were treated with chemically modified stable siRNA targeting either BMI1 or Ezh2 mRNAs to generate a cancer cell population with diminished levels of dual positive high BMI1/Ezh2-expressing carcinoma cells. Stable siRNA-treated prostate carcinoma cells continue to grow in adherent culture in vitro for several weeks allowing for expansion of siRNA-treated cultures in quantities sufficient for in vivo analysis.

These observations also indicate that the treatment protocol was well-tolerated and was not detrimental for the general growth properties of a cancer cell population. Quantitative immunofluorescence co-localization analysis demonstrated that carcinoma cells after treatment with the BMI1- or Ezh2-targeting stable siRNA continue to express significantly lower levels of targeted proteins for extended period of time (˜30-50% reduction at the 11 days post-treatment time point) compared to the cells treated with the control LUC siRNA. Importantly, the siRNA-treated human prostate carcinoma cell populations were essentially depleted for dual positive high BMI1/Ezh2-expressing carcinoma cells thus setting up the stage for critical in vivo analysis using a fluorescent orthotopic model of human prostate cancer metastasis in nude mice.

Remarkably, highly malignant human prostate carcinoma cell populations depleted for dual positive high BMI1/Ezh2-expressing cells demonstrated markedly diminished tumorigenic and metastatic potential in vivo. Within 3 weeks after inoculation of the 1.5×10⁶ of tumor cells, 100% of control animals developed rapidly growing highly invasive and metastatic carcinomas in the mouse prostate and all animal died within 50 days of the experiment. In contrast, only 20% of animals in both BMI1- and Ezh2-targeting therapy groups developed seemingly less malignant tumors causing death of hosts 78-87 days after tumor cell inoculation. Significantly, 150 days after tumor cell inoculation 83% and 67% of animals remain alive and disease-free in the therapy groups targeting the BMI1 and Ezh2 proteins, respectively (p=0.0007, Log rank test).

Increased Levels of Dual Positive High BMI/Ezh2-Expressing Cells Indicate Activation of the PcG Pathway in a Majority of Human Prostate Adenocarcinomas.

To validate the significance of our findings for human disease, the quantitative immunofluorescence co-localization analysis was applied for measurements of the expression of BMI1 and Ezh2 proteins and detection of dual positive high BMI/Ezh2-expressing carcinoma cells in clinical samples obtained from patients diagnosed with prostate adenocarcinomas. The results of this analysis demonstrate that a majority (79%-91% in different cohorts of patients) of human prostate tumors contains dual positive high BMI1/Ezh2-expressing carcinoma cells exceeding the threshold expression level in prostate samples from normal individuals. Interestingly, a panel of adenocarcinoma samples appears quite heterogeneous with respect to the relative levels of dual positive high BMI1/Ezh2-expressing cells. While in 50%-74% of prostate tumors the level of high BMI1-, high Ezh2-, or dual positive high BMI1/Ezh2-expressing cells was only slightly elevated (<15% of positive cells), a significant fraction (17%-29%) of prostate adenocarcinomas demonstrates a marked enrichment for dual positive high BMI1/Ezh2-expressing cells (>15% of positive cells).

Increased BMI1 and Ezh2 Expression is Associated with High Likelihood of Therapy Failure in Prostate Cancer Patients after Radical Prostatectomy.

Microarray analysis demonstrates that cancer patients with high levels of BMI1 and Ezh2 mRNA expression in prostate tumors have a significantly worst relapse-free survival after radical prostatectomy (RP) compared with the patients having low levels of BMI1 and Ezh2 expression (FIG. 10), suggesting that more profound alterations of the PcG protein chromatin silencing pathway in carcinoma cells are associated with therapy resistant clinically lethal prostate cancer phenotype. FIG. 10-3 shows the Kaplan-Meier survival analysis of 79 prostate cancer patients stratified into five sub-groups using eight-covariate cancer therapy outcome (CTO) algorithm (Table 2, below).

TABLE 2 8-covariate prostate cancer recurrence predictor model Signifi- Confidence Confidence cance, interval, interval, Covariate Coefficient SE P low 95% high 95% BMI1 4.7732 1.5179 0.0017 1.798 7.7483 Ezh2 0.4345 0.8215 0.5969 −1.1756 2.0446 PRE RP 0.0236 0.023 0.3054 −0.0215 0.0686 PSA RP GLSN 0.2809 0.1955 0.1508 −0.1023 0.6642 SUM Capsular 1.4752 0.7593 0.052 −0.0131 2.9634 Inv SM 0.7786 0.4641 0.0934 −0.1311 1.6883 Sem Ves 0.5876 0.4419 0.1836 −0.2785 1.4538 Inv AGE 0.041 0.0335 0.2214 −0.0247 0.1066 RP, radical prostatectomy; PSA, prostate-specific antigen; GLSN SUM, Gleason sum; SM, surgical margins; Sem Ves Inv, seminal vesicle invasion; Capsular Inv, capsular invasion. Overall model fit: Chi Square = 40.1250; df = 8; p < 0.0001.

The multivariate Cox proportional hazards survival analysis were carried out to ascertain the prognostic power of measurements of BMI1 and Ezh2 expression in combination with known clinical and pathological markers of prostate cancer therapy outcome such as Gleason score, surgical margins, extra-capsular invasion, seminal vesicle invasion, serum PSA levels, and age. Of note, BMI1 expression level remains a statistically significant prognostic marker in the multivariate analysis (Table 3). Application of the 8-covariate prostate cancer recurrence model combining the incremental statistical power of individual prognostic markers appears highly informative in stratification of prostate cancer patients into sub-groups with differing likelihood of therapy failure and disease relapse after radical prostatectomy (FIG. 10). One of the distinctive features of this model is that it identifies a sub-group of prostate cancer patients comprising bottom 20% of recurrence predictor score and manifesting no clinical or biochemical evidence of disease relapse (FIG. 10). In contrast, 80% of patients in a sub-group comprising top 20% of recurrence predictor score failed therapy within five year period after radical prostatectomy.

Increasing experimental evidence suggest that an oncogenic role of the BMI1 activation may be extended beyond the leukemia and, perhaps, play a key role in progression of the epithelial malignancies and other solid tumors as well. One of the compelling examples revealing an association of the activated BMI1 oncoprotein-driven pathway(s) with clinically lethal therapy-resistant malignant phenotype in patients diagnosed with multiple types of cancer is identification of a death-from-cancer gene expression signature. An 11-gene signature distinguishes stem cells with normal self-renewal function versus stem cells with drastically diminished self-renewal ability due to the loss of the BMI-1 oncogene and similarly expressed in metastatic prostate tumors. To date, the prognostic power of the 11-gene signature was validated in multiple independent therapy outcome sets of clinical samples obtained from more than 2,500 cancer patients diagnosed with 12 different types of cancer, including six epithelial (prostate; breast; lung; ovarian; gastric; and bladder cancers) and five non-epithelial (lymphoma; mesothelioma; medulloblastoma; glioma; and acute myeloid leukemia, AML) malignancies.

These data suggest the presence of a conserved BMI1 oncogene-driven pathway, which is similarly activated in both normal stem cells and a highly malignant subset of human cancers diagnosed in a wide range of organs and uniformly exhibiting a marked propensity toward metastatic dissemination as well as a therapy resistance phenotype. Taken together with the results of the present study these data support the hypothesis that activation of the PcG chromatin silencing pathway is one of the key regulatory factors determining a cellular phenotype captured by the expression of a death-from-cancer signature in therapy-resistant clinically lethal malignancies.

Cancer cells with activated PcG pathway would be expected to exhibit a concomitantly high expression of both BMI1 and Ezh2 proteins. Furthermore, cells with activated PcG pathway would manifest the increased expression levels of protein substrates targeted by the activation of corresponding enzymes to catalyze the H2A-K119 ubiquitination (BMI1-containing PRC1 complex) and H3-K27 methylation (Ezh2-containing PRC2 complex). In this study it was experimentally tested that the relevance of this concept for metastatic prostate cancer. A quantitative co-localization immunofluorescence analysis was applied to measure the expression of four distinct protein markers of the PcG pathway activation and demonstrated a concomitantly increased expression of all four markers in a sub-population of human prostate carcinoma metastasis precursor cells isolated from the blood of nude mice bearing orthotopic metastatic human prostate carcinoma xenografts. Presence of dual positive high BMI1/Ezh2-expressing cells appears essential for maintenance of tumorigenic and metastatic potential of human prostate carcinoma cells in vivo, since targeted depletion of dual positive high BMI1/Ezh2-expressing cells from a population of highly metastatic human prostate carcinoma cells treated with stable siRNAs generates a cancer cell population with dramatically diminished malignant potential in vivo.

Histone Markers within PcG Pathway

The BMI1 and Ezh2 proteins are members of the Polycomb group protein (PcG) chromatin silencing complexes conferring genome scale transcriptional repression via covalent modification of histones. The BMI1 PcG protein is a component hPRC1L complex (human Polycomb repressive complex 1-like) which was recently identified as the E3 ubiquitin ligase complex that is specific for histone H2A and plays a key role in Polycomb silencing. Ubiquitination/deubiquitination cycle of histones H2A and H2B is important in regulating chromatin dynamics and transcription mediated, in part, via ‘cross-talk’ between histone ubiquitination and methylation. Importantly, one of the up-regulated genes in the 11-gene death-from-cancer signature profile (Rnf2) plays a central role in the PRC1 complex formation and function thus complementing the BMI-1 function in the PRC1 complex. Rnf2 expression plays a crucial non-redundant role in development during a transient contact formation between PRC1 and PRC2 complexes via Rnf2 as described for Drosophila.

The Ezh2 protein is a member of the Polycomb PRC2 and PRC3 complexes with a histone lysine methyltransferase (HKMT) activity that is associated with transcriptional repression due to chromatin silencing. The HKMT-Ezh2 activity targets lysine residues on histones H1 and H3 (H3-K27 or H1-K26). H3-K27 methylation conferred by an active HKMT-Ezh2-containing complex is one of the key molecular events essential for chromatin silencing in vivo. Collectively, these data imply that in vivo Polycomb chromatin silencing pathway in distinct cell types would require a coordinate activation of multiple distinct PRC complexes. For example, Ezh2 associates with different EED isoforms thereby determining the specificity of histone methyltransferase activity toward histone H3-K27 or histone H1-K26. Collectively, these results suggest that coherent function of the PcG chromatin silencing pathway would require a concomitant coordinated activation of multiple protein components of PRC1, PRC2, and PRC3 complexes implying a coordinate regulation of expression of their essential components such as BMI1 and Ezh2 oncoproteins. It follows that dual positive high BMI1/Ezh2-expressing carcinoma cells with elevated expression of the H2AubiK119 and H3metK27 histones should be regarded as cells with activated PcG protein chromatin silencing pathway.

In human cells the BMI1-containing PcG complex forms a unique discrete nuclear structure that was termed the PcG bodies, the size and number of which in nuclei significantly varied in different cell types. Of note, the nuclei of dual positive high BMI1/Ezh2-expressing cells almost uniformly contain six prominent discrete PcG bodies, perhaps, reflecting the high level of the BMI1 expression and indicating the active state of the PcG protein chromatin silencing pathway. It has been shown recently that in cancer cells expressing high level of the Ezh2 protein the new type of the PcG chromatin silencing complex is formed containing the Sirt1 protein. This suggests that in high Ezh2-expressing carcinoma cells a distinct set of genetic loci could be repressed due to activation of the Ezh2/Sirt1-containing PcG chromatin silencing complex.

One of the notable features of dual positive high BMI1/Ezh2-expressing carcinoma cells is a prominent cytosolic expression of the Ezh2 oncoprotein (FIG. 8). Recent evidence revealed the existence of the cytosolic Ezh2-containing methyltransferase complex regulating actin polymerization and extra-nuclear signaling processes in various cell types. It is possible that both nuclear and extra-nuclear functions of the Ezh2-containing methyltransferase complex may play an important role in determining the malignant behavior of metastatic human prostate carcinoma cells. Recent observations directly demonstrated that the PcG repressive complexes PRC1 and PRC2 co-occupied a large set of genes in human and murine genomes, many of which are transcriptional developmental regulators. This suggests that repression of multiple developmental and differentiation pathways by Polycomb complexes may be required for maintaining stem cell pluripotency and add further support to the idea that repression of critical developmental regulators by PcG proteins may play a crucial role in tumor progression and metastasis.

The results of our experiments indicate that PcG pathway is frequently activated in human prostate tumors and is mechanistically linked to the highly malignant behavior of human prostate carcinoma cells in a xenograft model of prostate cancer metastasis. It remains to be elucidated whether similarly to the xenograft model of human prostate cancer metastasis in nude mice the PcG pathway activation is mechanistically associated with metastatic disease in prostate cancer patients as well. Whether the level of enrichment of primary prostate tumors with dual positive high BMI1/Ezh2-expressing cancer cells would correlate with a degree of PcG pathway activation and would be informative in predicting the clinical behavior of prostate cancer in patients. Follow-up studies are expected to determine whether human prostate tumors manifesting markedly increased levels of dual positive high BMI1/Ezh2-expressing cells represent a therapy resistant clinically lethal type of prostate adenocarcinomas. This technology provides the basis for development of small molecule inhibitors of the PcG protein chromatin silencing pathway as a novel therapeutic modality for treatment of metastatic prostate cancer.

Stemness Pathway

Another pathway implicated in cancer progression is the “stemness” pathway. A cancer stem cell hypothesis proposes that the presence of rare stem cell-resembling tumor cells among the heterogeneous mix of cells comprising a tumor is essential for tumor progression and metastasis of epithelial malignancies. One of the implications of a cancer stem cell hypothesis is that similar genetic regulatory pathways might define critical stem cell-like functions in both normal and tumor stem cells.

Recent experimental and clinical observations identified the BMI1 oncogene-driven pathway(s) as one of the key regulatory mechanisms of “stemness” functions in both normal and cancer stem cells. The Polycomb group (PcG) gene BMI1 influences the proliferative potential of normal and leukemic stem cells and is required for the self-renewal of hematopoietic and neural stem cells. Self-renewal ability is one of the essential defining properties of a pluripotent stem cell phenotype. BMI1 oncogene is expressed in all primary myeloid leukemia and leukemic cell lines analyzed so far and over-expression of BMI1 causes neoplastic transformation of lymphocytes. Recent experimental observations documented an increased BMI1 expression in human non-small-cell lung cancer, human breast carcinomas and breast cancer cell lines, human medulloblastomas, prostate carcinomas, and gastrointestinal cancers, supporting the idea that an oncogenic role of the BMI1 activation may affect progression of the epithelial malignancies and other solid tumors as well.

Recent clinical genomics data provide a powerful evidence supporting a cancer stem cell hypothesis and suggest that gene expression signatures associated with the “stemness” state of a cell (defined as phenotypes of self-renewal, asymmetrical division, and pluripotency) might be informative as molecular predictors of cancer therapy outcome. A mouse/human comparative cross-species translational genomics approach was utilized to identify an 11-gene signature that distinguishes stem cells with normal self-renewal function from stem cells with drastically diminished self-renewal ability due to the loss of the BMI1 oncogene as well as consistently displays a normal stem cell-like expression profile in distant metastatic lesions as revealed by the analysis of metastases and primary tumors in both a transgenic mouse model of prostate cancer and cancer patients.

Kaplan-Meier analysis confirmed that a stem cell-like expression profile of the 11-gene signature in primary tumors is a consistent powerful predictor of a short interval to disease recurrence, distant metastasis, and death after therapy in cancer patients diagnosed with twelve distinct types of cancer. These data suggest the presence of a conserved BMI1 oncogene-driven pathway, which is similarly activated in both normal stem cells and a clinically lethal therapy-resistant subset of human tumors diagnosed in a wide range of organs and uniformly exhibiting a marked propensity toward metastatic dissemination. Consistent with this idea, the essential role of the BMI1 oncogene activation in prostate cancer metastasis as well as in the maintenance of a self-renewal ability and high malignant potential of human breast cancer stem cells has been demonstrated. Cancer stem cells may indeed constitute metastasis precursor cells since most of the early disseminated carcinoma cells detected in the bone marrow of breast cancer patients manifest a breast cancer stem cell phenotype.

Recent genome-scale chromatin immunoprecipitation (ChIP) experiments and RNA interference analysis identified multiple critical pathways comprising an essential genetic regulatory circuitry of mouse and human embryonic stem cells (ESC). Similarly to the BMI1 knockout studies, in these experiments the self-renewal and proliferation functions of the normal stem cells appeared successfully uncoupled, thus allowing to dissect the critical regulatory pathways essential for maintenance of the self-renewal state of ESC and providing reliable models to study the relevance of the ESC-defined “stemness”/differentiation pathways to human cancer.

These advances were used to identify gene expression signatures of embryonic stem cells (ESC) during transition from self-renewing, pluripotent state to differentiated phenotypes in several experimental models of differentiation of human and mouse ESC. This analysis reveals multiple gene expression signatures of the ESC regulatory circuitry which appear highly informative in stratification of the early-stage breast, lung, and prostate cancer patients into sub-groups with dramatically distinct likelihood of therapy failure.

Genetic Signatures of Regulatory Circuitry of Embryonic Stem Cells (ESC) Identify Therapy-Resistant Phenotypes in Cancer Patients Diagnosed with Multiple Types of Epithelial Malignancies.

Recent discovery of death-from-cancer signature genes implies that genetic signatures associated with a “stemness” state (defined as phenotypes of asymmetrical division, pluripotency, and self-renewal) might be informative as molecular predictors of cancer therapy outcome (Glinsky et al., J. Clin. Invest. 115: 1503-1521 (2005)). The validity of this concept was tested while exploring the results of genome-wide microarray and chromatin immunoprecipitation analyses of several experimental models of differentiation of human and mouse ESC (Boyer et al, Cell 122 947-956 (2005; Lee et al., Cell 125: 301-313 (2006); Bernstein et al., Cell 125: 315-326 (2006); Boyer et al., Nature 441: 349-353 (2006).

Applying signature discovery principles to analysis of gene expression profiles during transition of ESC from self-renewing, pluripotent state to differentiated phenotypes, it was identified that seven gene expression signatures associated with a “stemness” epigenetic program of ESC that appear highly informative in stratification of the early-stage breast, prostate, and lung cancer patients into sub-groups with dramatically distinct likelihood of therapy failure. Cancer therapy outcome predictor (CTOP) algorithm employing a panel of “stemness’ signatures [signatures of Nanog/Sox2/Oct4-, EED-, and Suz12-pathways; transposon exclusion zones (TEZ) and bivalent chromatin domains (BCD) signatures] and a Myc-driven “wound signature” demonstrates nearly 100% specificity and sensitivity of CTOP power in retrospective analysis of large independent cohorts of breast, prostate, lung, and ovarian cancer patients. To date, the retrospective analysis of the prognostic power of individual “stemness” signatures is being extended to more than 3,100 patients diagnosed with 12 distinct types of cancer (Table 3)

TABLE 3 Cancer types and number of cancer patients in clinical cohorts utilized for analysis of therapy outcome correlations with distinct expression profiles of the 11-gene BMI1-pathway signature Number of patients in the outcome Cancer Type sets References Prostate Cancer 220 J. Clin. Invest., 113: 913 (2004); Cancer Cell, 1: 203 (2002); PNAS, 101: 614 (2004); PNAS, 101: 811 (2004); JCO, 22: 2790 (2004); J. Clin. Invest., 115: 1503 (2005) Breast Cancer 1171 Nature, 415: 530 (2002); NEJM, 347: 1999 (2002); PNAS, 100: 10393 (2003); Cancer Cell, 5: 507 (2004); PNAS, 100: 8419 (2003); Lancet, 381: 1590 (2003); Lancet, 365: 571 (2005); JCI, 115: 44 (2005); Nature, 439: 353 (2006) Lung Cancer 340 PNAS, 98: 13790 (2001); Nature Medicine, 8: 815 (2002); Nature, 439: 353 (2006) Gastric Cancer 39 PNAS, 99: 15203 (2002) Ovarian Cancer 216 Clin. Cancer Res. 10: 3291 (2004); J. Soc. Gynecol. Investig. 11: 51 (2004); Bladder Cancer 31 Nature Genetics, 33: 90 (2003) Follicular 191 NEJM, 351: 2159 (2004) Lymphoma Lymphoma 298 NEJM, 346: 1937 (2002); Nature (DLBCL) Medicine, 8: 68 (2002) Mesothelloma 17 J. National Cancer Inst., 96: 698 (2003) Medulloblastoma 60 Nature, 415: 436 (2002) Glioma 50 Cancer Res., 53: 1602 (2003) Lymphoma 92 Cancer Cell, 3: 185 (2003) (MCL) AML 401 NEJM, 350: 1605 (2004); NEJM, 350: 1617 (2004) Total 3176

The analysis demonstrates that therapy-resistant and therapy-responsive cancer phenotypes manifest distinct patterns of association with “stemness”/differentiation pathways, suggesting that therapy-resistant and therapy-responsive tumors develop within genetically distinct “stemness”/differentiation programs. These differences can be exploited for development of prognostic and therapy selection genetic tests utilizing microarray-based CTOP algorithm. One of the major regulatory pathways manifesting distinct patterns of association with therapy-resistant and therapy-responsive cancer phenotypes is the Polycomb group (PcG) proteins chromatin silencing pathway. RNAi-mediated targeting of the critical regulatory components of the PcG pathway in metastatic cancer cells eradicates disease in 67-83% of animals in a fluorescent orthotopic model of human prostate cancer metastasis in nude mice. To further validate the clinical relevance of these findings, the quantitative co-localization immunofluorescence analysis of the selected PcG proteins was carried out using TMA of more than 300 prostate tumors obtained from patients with known long-term clinical outcome after therapy. The analysis demonstrates that “stemness” pattern of the PcG pathway activation in prostate tumors is associated with the increased likelihood of therapy failure. Genetic signatures of “stemness” state identify therapy-resistant phenotypes in cancer patients diagnosed with multiple types of epithelial malignancies. These results provide powerful clinical evidence supporting the validity of the concept of cancer stem cells for human solid tumors.

Multiple Gene Expression Signatures of the ESC Regulatory Circuitry Predict Therapy Failure in Prostate Cancer Patients

Translational genomics data suggest that gene expression signatures associated with the “stemness” state of a cell might be informative as molecular predictors of cancer therapy outcome. Recent ChIP and RNA interference experiments identified multiple genetic pathways comprising an essential genetic regulatory circuitry of mouse and human embryonic stem cells. Similarly to the BMI1 knockout studies, in these experiments the self-renewal and proliferation functions of the normal stem cells were successfully uncoupled, thus providing reliable model systems dissecting the critical regulatory pathways essential for maintenance of the self-renewal state of ESC. These advances were used to study the relevance to human cancer of the multiple ESC-associated “stemness”/differentiation pathways defined in several experimental models of differentiation of human and mouse ESC.

Six large parent gene sets representing major genetic pathways associated with the essential regulatory circuitry of mouse and human ESC were selected for the initial analysis (Table 4).

TABLE 4 Classification performance of individual Polycomb pathway “stemness” signatures and CTOP “stemness” algorithms in predicting clinical outcome of prostate cancer Number of Number of Log-rank Affymetrix Transcripts Transcripts test Parent Gene Microarray Platform Parent Gene Prostate Detection of Chi Hazard Sets CTOP “stemness” signatures Sets Cancer failures, % P value square Ratio 95% CI of ratio Data Source TEZ 236 32 33/37 (89%) <0.0001 54.03 16.12 6.925 to 28.29 FIG. 17 EED-pathway 117 36 33/37 (89%) <0.0001 52.73 15.7 6.691 to 27.28 Suz12/POLII 79 22 33/37 (89%) <0.0001 52.44 15.86 6.559 to 26.49 FIG. 21 Suz12 142 26 35/37 (95%) <0.0001 66.58 34.87 9.343 to 38.38 FIG. 21 Nanog/Sox2/Oct4 164 28 33/37 (89%) <0.0001 54.37 16.04 7.052 to 29.01 FIG. 17 PcG-TF 176 21 33/37 (89%) <0.0001 48.49 14.96 5.787 to 22.89 FIG. 16 BCD-TF 73 31 33/37 (89%) <0.0001 50.53 15.4 6.180 to 24.73 FIG. 15 ESC pattern 3 158 37 35/37 (95%) <0.0001 72.9 37.19 11.30 to 47.95 BMI1 pathway 199 11 28/37 (76%) <0.0001 18.81 4.454 2.240 to 8.471 FIG. 14 PcG methylation 98 35 33/37 (89%) <0.0001 55.71 16.57 7.275 to 29.90 FIG. 16 Histone H3 20 20 29/37 (78%) <0.0001 26.7 5.903 3.036 to 11.80 This work Histone H2A 24 24 32/37 (86%) <0.0001 41.44 11.08 4.767 to 18.71 This work Histones H3/H2A 44 27 34/37 (92%) <0.0001 59.97 21.97 8.103 to 33.46 This work Six ESC signatures 914 165 37/37 (100%) <0.0001 83.12 Und Undefined This work Eight ESC signatures 1145 233 37/37 (100%) <0.0001 83.12 Und Undefined This work Nine “stemness” signatures 1344 244 37/37 (100%) <0.0001 83.12 Und Undefined This work Ten “stemness” signatures 1442 279 37/37 (100%) <0.0001 81.18 Und Undefined This work Eleven “stemness” signatures 1486 306 37/37 (100%) <0.0001 81.18 Und Undefined This work Legend: Seventy-nine prostate cancer patients, thirty-seven of which failed therapy within five years after radical prostatectomy and forty-two remain disease-free for at least five years, were stratified into poor prognosis (top 50% scores) and good prognosis (bottom 50% scores) groups based on the values of either individual CTOP scores (determined using weighted algorithm scores of the corresponding “stemness” signatures) or cumulative CTOP scores comprising the sum of the multiple individual signatures: Six ESC signatures (TEZ; EED; Suz12/POLII; Suz12; Nanog/Sox2/Oct4; PcG-TF signatures); Eight ESC signatures (six ESC signatures plus BCD-TF and ESC pattern3 signatures); Nine “stemness” signatures (eight ESC signatures plus BMI-pathway signature); Ten “stemness” signatures (nine “stemness” signatures plus PcG methylation signature); Eleven “stemness” signatures (ten “stemness” signatures plus Histones H3/H2A signature). Detection of failures (the number and percentage) was calculated as the number of cases that actually failed therapy and were classified by the CTOP algorithm into poor prognosis groups (top 50% scores) with relation to the total number of therapy failure cases in the data set. Microarray data sets and associated clinical information were reported elsewhere (5). Und, undefined due to the 100% cure rate in the good prognosis group.

These pathways were independently defined by different groups using distinct experimental approaches and protocols. Using multivariate Cox regression analysis, the prognostic power of these gene sets were interrogated and it was found that all six gene sets provide highly informative signatures for stratification of prostate cancer patients into sub-groups with distinct likelihood of therapy failure (FIG. 22 and Table 4). To assess the comparative prognostic performance of the signatures, we evaluated the individual Kaplan-Meier survival curves using the same 50% cut-off level in dividing the patients into poor prognosis (top 50% scores) and good prognosis (bottom 50% scores) sub-groups. It was found that all six signatures perform with similar accuracy in stratification of prostate cancer patients into sub-groups with statistically distinct probability of relapse after radical prostatectomy (FIG. 22). When the prognostic powers of the ESC-derived signatures were combined into six-signature cancer therapy outcome predictor (CTOP) algorithm by adding the values of individual CTOP scores, the resulting prognostic performance appears significantly improved reaching nearly 100% accuracy (FIG. 22 and Table 4).

Gene Expression Signatures of the ESC Regulatory Circuitry Predict Therapy Failure in Multiple Independent Data Sets of Breast Cancer Patients.

At the next step of the analysis it was sought to determine whether this approach would be applicable for evaluation of therapy outcome in breast cancer patients as well. Similarly to the prostate cancer data set, all six gene sets of the ESC regulatory circuitry generate gene expression-based predictors of the likelihood of treatment failure in breast cancer patients (FIG. 23 and Table 5).

TABLE 5 Classification performance of individual Polycomb pathway “stemness” signatures and CTOP “stemness” algorithms in predicting clinical outcome of the early-stage LN negative breast cancer (Affymetrix Microarray Platform) Affymetrix Microarray Number of Number of Platform Transcripts Transcripts Log-rank Parent Gene CTOP “stemness” Parent Gene Breast Detection of Chi test Sets signatures Sets Cancer failures, % P value square Hazard Ratio 95% CI of ratio Data Source TEZ 236 36 85/107 (79%) <0.0001 60.1 5.191 3.131 to 6.778 FIG. 17 EED-pathway 117 20 79/107 (74%) <0.0001 41.46 3.704 2.413 to 5.217 Suz12/POLII 79 20 82/107 (77%) <0.0001 51.63 4.427 2.800 to 6.064 FIG. 21 Suz12 142 25 81/107 (76%) <0.0001 46.63 4.092 2.603 to 5.623 FIG. 21 Nanog/Sox2/Oct4 164 41 87/107 (81%) <0.0001 73.64 6.282 3.724 to 8.110 FIG. 17 PcG-TF 176 30 81/107 (76%) <0.0001 48.47 4.182 2.680 to 5.804 FIG. 20 BCD-TF 73 26 82/107 (77%) <0.0001 51.42 4.413 2.793 to 6.048 FIG. 17 ESC pattern 3 158 35 87/107 (81%) <0.0001 72.67 6.218 3.679 to 8.009 BMI1 pathway 199 11 67/107 (63%) 0.0005 12.11 1.972 1.345 to 2.886 FIG. 14 PcG methylation 98 22 87/107 (81%) <0.0001 73.94 6.301 3.737 to 8.139 FIG. 16 Histone H3 20 13 72/107 (67%) <0.0001 22.23 2.54 1.713 to 3.687 This work Histone H2A 24 24 70/107 (65%) <0.0001 19.53 2.378 1.618 to 3.482 This work Histones H3/H2A 44 44 76/107 (71%) <0.0001 31.98 3.113 2.063 to 4.447 This work Six ESC signatures 914 172 94/107 (88%) <0.0001 107.4 11.09 5.381 to 11.79 This work Eight ESC signatures 1145 233 95/107 (89%) <0.0001 112.3 12.17 5.651 to 12.40 This work Nine “stemness” 1344 244 97/107 (91%) <0.0001 124.3 15.25 6.351 to 13.98 This work signatures Ten “stemness” 1442 266 98/107 (92%) <0.0001 127.7 17.01 6.538 to 14.37 This work signatures Eleven “stemness” 1486 310 99/107 (93%) <0.0001 132.1 19.31 6.793 to 14.93 This work signatures Legend: Two-hundred-eighty-six early-stage LN-negative breast cancer patients, one-hundred-seven of which failed therapy within five years after surgery and one-hundred-seventy-nine remain disease-free for at least five years, were stratified into poor prognosis (top 50% scores) and good prognosis (bottom 50% scores) groups based on the values of either individual CTOP scores (determined using weighted algorithm scores of the corresponding “stemness” signatures) or cumulative CTOP scores comprising the sum of the multiple individual signatures: Six ESC signatures (TEZ; EED; Suz12/POLII; Suz12; Nanog/Sox2/Oct4; PcG-TF signatures); Eight ESC signatures (six ESC signatures plus BCD-TF and ESC pattern3 signatures); Nine “stemness” signatures (eight ESC signatures plus BMI-pathway signature); Ten “stemness” signatures (nine “stemness” signatures plus PcG methylation signature); Eleven “stemness” signatures (ten “stemness” signatures plus Histones H3/H2A signature). Detection of failures (the number and percentage) was calculated as the number of cases that actually failed therapy and were classified by the CTOP algorithm into poor prognosis groups (top 50% scores) with relation to the total number of therapy failure cases in the data set. Microarray data sets and associated clinical information were reported elsewhere. The individual predictors perform with similar prognostic classification accuracy and six-signature CTOP algorithm demonstrates significantly improved patients' stratification performance compared to the individual signatures (FIG. 23 and Table 5). To validate the findings, the analysis is extended by using four additional breast cancer therapy outcome data sets which were previously developed and analyzed in three independent institutions. As shown in FIG. 23, this analysis confirmed that ESC-based CTOP algorithm is informative in multiple independent breast cancer therapy outcome data sets comprising altogether more than 900 breast cancer patients (FIG. 23 and Tables 5-7).

TABLE 6 Classification performance of individual Polycomb pathway “stemness” signatures and CTOP “stemness” algorithms in predicting clinical outcome of the early-stage LN negative breast cancer (Agilent Microarray Platform; clinical end-point: metastasis-free survival) Agilent Microarray Platform Number of “Stemness” CTOP Transcripts Detection of Log-rank test signatures Breast Cancer failures, % P values Chi square Hazard Ratio 95% CI of ratio TEZ signature 17 37/46 (80%) <0.0001 37 6.797 3.580 to 12.04 EED-pathway 22 36/46 (78%) <0.0001 33.98 6.045 3.313 to 11.15 Suz12/POLII 21 39/46 (85%) <0.0001 47.16 9.493 4.631 to 15.76 Suz12 27 37/46 (80%) <0.0001 36.59 6.724 3.545 to 11.93 Nanog/Sox2/Oct4 38 39/46 (85%) <0.0001 52.78 10.36 5.378 to 18.64 PcG-TF signature 28 33/46 (72%) <0.0001 16.55 3.445 1.888 to 6.161 BCD-TF 26 39/46 (85%) <0.0001 52.6 10.37 5.338 to 18.45 BMI1 pathway 11 31/36 (67%) 0.0003 13.23 2.946 1.660 to 5.428 PcG methylation 29 43/46 (93%) <0.0001 73.54 26.55 8.258 to 28.85 Histone H3 14 31/46 (67%) 0.0002 14.15 3.041 1.728 to 5.681 Histone H2A 15 33/46 (72%) <0.0001 15.72 3.357 1.827 to 5.935 Histones H3/H2A 29 36/46 (78%) <0.0001 29.23 5.451 2.865 to 9.484 Six ESC signatures 153 43/46 (93%) <0.0001 75.11 27.11 8.547 to 29.95 Ten “stemness” 248 44/46 (96%) <0.0001 88.05 44.81 11.18 to 40.00 signatures Legend: Ninety-seven early-stage LN-negative breast cancer patients, forty-six of which failed therapy within five years after surgery and fifty-one remain disease-free for at least five years, were stratified into poor prognosis (top 50% scores) and good prognosis (bottom 50% scores) groups based on the values of either individual CTOP scores (determined using weighted algorithm scores of the corresponding “stemness” signatures) or cumulative CTOP scores comprising the sum of the multiple individual signatures: Six ESC signatures (TEZ; EED; Suz12/POLII; Suz12; Nanog/Sox2/Oct4; PcG-TF signatures); Ten “stemness” signatures (six ESC signatures plus BCD-TF, BMI1-pathway, PcG methylation, and Histones H3/H2A signatures). Detection of failures (the number and percentage) was calculated as the number of cases that actually failed therapy and were classified by the CTOP algorithm into poor prognosis groups (top 50% scores) with relation to the total number of therapy failure cases in the data set. Microarray data sets and associated clinical information were reported elsewhere.

TABLE 7 Classification performance of individual Polycomb pathway “stemness” signatures and CTOP “stemness” algorithms in predicting clinical outcome of breast cancer (Agilent Microarray Platform; clinical end-point: death after therapy) Agilent Microarray Platform Number of “Stemness” CTOP Transcripts Detection of Log-rank test signatures Breast Cancer failures, % P values Chi square Hazard Ratio 95% CI of ratio TEZ signature 17 63/79 (80%) <0.0001 42.45 5.116 2.819 to 6.876 EED-pathway 22 66/79 (84%) <0.0001 50.08 6.419 3.202 to 7.810 Suz12/POLII 21 62/79 (78%) <0.0001 34.11 4.321 2.404 to 5.829 Suz12 27 63/79 (80%) <0.0001 41.40 5.021 2.768 to 6.753 Nanog/Sox2/Oct4 38 66/79 (84%) <0.0001 57.62 7.071 3.654 to 9.007 PcG-TF signature 28 62/79 (78%) <0.0001 38.07 4.621 2.603 to 6.343 BCD-TF 26 57/79 (72%) <0.0001 23.00 3.122 1.901 to 4.620 BMI1 pathway 11 60/79 (76%) <0.0001 30.95 3.877 2.264 to 5.505 PcG methylation 29 65/79 (82%) <0.0001 42.31 5.483 2.793 to 6.775 Histone H3 14 51/79 (65%) 0.0008 11.18 2.148 1.369 to 3.328 Histone H2A 15 60/79 (76%) <0.0001 32.50 3.984 2.341 to 5.709 Histones H3/H2A 9 61/79 (77%) <0.0001 36.30 4.348 2.529 to 6.186 Six ESC signatures 153 72/79 (91%) <0.0001 80.42 14.33 5.010 to 12.34 Nine “stemness” 219 72/79 (91%) <0.0001 80.05 14.26 4.987 to 12.29 signatures Ten “stemness” 238 73/79 (92%) <0.0001 85.38 17.07 5.347 to 13.19 signatures Legend: Two-hundred-ninety-five breast cancer patients, seventy-nine of which died within five years after therapy and two-hundred-sixteen remain alive for at least five years, were stratified into poor prognosis (top 50% scores) and good prognosis (bottom 50% scores) groups based on the values of either individual CTOP scores (determined using weighted algorithm scores of the corresponding “stemness” signatures) or cumulative CTOP scores comprising the sum of the multiple individual signatures: Six ESC signatures (TEZ; EED; Suz12/POLII; Suz12; Nanog/Sox2/Oct4; PcG-TF signatures); Nine “stemness” signatures (six ESC signatures plus BCD-TF, BMI1-pathway, and PcG methylation signatures). Ten “stemness” signatures (six ESC signatures plus BCD-TF, BMI1-pathway, PcG methylation, and Histones H3/H2A signatures). Detection of failures (the number and percentage) was calculated as the number of cases that actually failed therapy and were classified by the CTOP algorithm into poor prognosis groups (top 50% scores) with relation to the total number of therapy failure cases in the data set. Microarray data sets and associated clinical information were reported elsewhere. Transcription Factors as Markers within Oncogenic Pathways

The present invention can also be used to analyze the level of transcription factors as either an indicator of the presence of cancer or as a predictor of cancer therapy outcome. Details of transcription factor analysis are below.

Distinct Gene Expression Profiles of the Bivalent Chromatin Domain Transcription Factor Genes (BCD-TF) are Associated with Therapy-Resistant and Therapy-Sensitive Phenotypes of Human Prostate and Breast Cancers.

In genomes of somatic cells nucleosomal compositions of histones harboring specific modifications of the histone tails defines mutually exclusive transcriptionally active or silent states of the chromatin. Transcriptional status of corresponding genetic loci in genomes of most cells is governed by the nucleosome-defined chromatin patterns and strictly follows activation/repression rules. In contrast to somatic cells, in ESC multiple chromosomal regions were identified simultaneously harboring both “silent” (H3K27met3) and “active” (H3K4) histone marks and ˜100 transcription factor (TF) encoding genes are residing within these bivalent chromatin domain-containing chromosomal regions. Many of the bivalent chromatin domain (BCD)—containing genes were previously identified as the Polycomb Group (PcG) protein-target genes in both human and mouse ESC and are repressed or transcribed at low levels in ESC.

These observations form the basis for a hypothesis that transcriptional repression of BCD genes is essential for maintenance of the “stemness” state of ESC and the unique BCD status of these genes make them poised for rapid transcriptional activation during transition from pluripotent self-renewing state of ESC to differentiated phenotypes.

Consistent with this idea, in differentiated cells the BCD pattern of these genes is resolved in either transcriptionally active or repressed chromatin domains and activated or repressed transcription of corresponding genes. It is noted that many BCD genes were also identified earlier as members of the core transcriptional regulatory circuitry of ESC manifesting the co-occupancy of their promoters by major “stemness” transcription factors. Furthermore, careful review of the available gene expression data sets of ESC in pluripotent self-renewing state reveals that several BCD-TF genes of this category are maintained in a transcriptionally active state.

This analysis suggests that expression of selected TF encoding genes in ESC, including bivalent chromatin domain-containing TF genes (BCD-TF), maintenance of a “stemness” state, and transition to differentiated phenotypes may be regulated by the balance of the “stemness” TFs such as Nanog, Sox2, Oct4, and PcG proteins bound to the promoters of target genes. If this is true, the “stemness” state of ESC should be associated with the unique profile of the BCD-TF expression comprising both up- and down-regulated transcripts that may be defined as the “stemness” BCD-TF signature (FIG. 24). It would be of interest to determine whether human tumors manifest a common pattern of the BCD-TF expression resembling a “stemness” profile of the BCD-TF signature.

Gene expression profiles of BCD-TF in clinical samples were independently generated for therapy-resistant breast and prostate tumors using multivariate Cox regression analysis of microarrays of tumor samples from 286 breast cancer and 79 prostate cancer patients with known long-term clinical outcome after therapy and tested for concordant pattern. This analysis identified the thirteen-gene BCD-TF signature manifesting highly concordant gene expression profiles (r=0.853; P<0.001; FIG. 24) in breast and prostate tumors from patients with therapy-resistant disease phenotypes. Next, “stemness” gene expression profiles of BCD-TF in mouse ESC were derived by comparing microarray analyses of pluripotent self-renewing ESC (control ESC cultures treated with HP siRNA) versus ESC treated with Esrrb siRNA (day 6). At this time point, Esrrb siRNA-treated ESC does not manifest “stemness” phenotype and form colonies of differentiated cells. Mouse genes comprising the “stemness” BCD-TF signature were translated into set of human orthologs and BCD-TF gene expression profiles of therapy-resistant clinical samples and ESC were tested for concordant pattern. This analysis identifies the eight-gene BCD-TF signature manifesting highly concordant expression profiles (r=0.716; p<0.001; FIG. 24) in ESC and therapy-resistant breast and prostate tumors. Kaplan-Meier analysis demonstrates that prostate and breast cancer patients with tumors harboring ESC-like expression profiles of the eight-gene BCD-TF signature are more likely to fail therapy (bottom two panels), suggesting that a sub-set of BCD-TF genes defined here as the eight gene BCD-TF signature manifests “stemness” expression profiles in therapy-resistant prostate and breast tumors (FIG. 24).

Therapy-Resistant and Therapy-Sensitive Tumors Manifest Distinct Gene Expression Profiles of the ESC “Stemness”/Differentiation Program.

The analysis suggests that therapy-resistant and therapy-sensitive tumors manifest distinct pattern of association with “stemness”/differentiation pathways engaged in ESC during transition from pluripotent self-renewing state to differentiated phenotypes. One of the major implications of this hypothesis is the prediction that therapy-resistant and therapy-sensitive tumors develop within genetically distinct “stemness”/differentiation programs. This prediction was tested by interrogating the prognostic power of genes comprising the ESC pattern 3 “stemness”/differentiation program recently identified by a combination of the RNA interference and gene expression analyses. It was found that similarly to the BCD-TF signatures the gene set comprising the ESC pattern 3 “stemness”/differentiation pathway generates gene expression signatures discriminating therapy-resistant and therapy-sensitive prostate and breast tumors (FIG. 25). These results support the hypothesis that therapy-resistant and therapy-sensitive cancers may develop within genetically distinct “stemness”/differentiation programs triggered by the altered balance of “stemness” TF and immediate down-stream changes in expression of the BCD-TF genes.

DNA Promoter Methylation Patterns as Markers within Oncogenic Pathways

The present invention can also be used to analyze the DNA promoter methylation patterns of genes within oncogenic pathways as either an indicator of the presence of cancer or as a predictor of cancer therapy outcome. Details of the analysis of DNA promoter methylation patterns of genes within oncogenic pathways are below.

Is Therapy-Resistant Phenotype of Human Epithelial Malignancies Associated with Distinct Methylation Patterns of the Polycomb Target Genes?

Recent experimental observations indicate that promoters of genes identified as the PcG targets in ESC are preferentially targeted for cancer-associated DNA hypermethylation and stable transcriptional repression in multiple types of human cancers. DNA promoter methylation patterns of the PcG target genes appear significantly distinct in different types of tumors, suggesting the presence of cancer type-specific profiles of DNA promoter hypermethylation, transcriptional repression, and mRNA expression of the PcG target genes. To determine whether gene expression profiles of the PcG target genes promoters of which are hypermethylated in human cancers would be associated with distinct likelihood of therapy failure in prostate and breast cancer patients was analyzed. The analysis utilized a set of 88 PcG target genes previously reported to be hypermethylated in cancer (FIG. 14). Multivariate Cox regression analysis demonstrates that PcG target genes with promoters frequently hypermethylated in cancer manifest distinct expression profiles associated with therapy-resistant and therapy-sensitive prostate and breast cancers (FIG. 26), implying that differences in gene expression between tumors with distinct outcome after therapy may be driven, in part, by the distinct promoter hypermethylation patterns of the PcG target genes. These differences can be exploited to generate highly informative gene expression signatures of the PcG target genes hypermethylated in cancer for stratification of prostate and breast cancer patients into sub-groups with statistically distinct likelihood of therapy failure (FIG. 26). This analysis suggests that therapy-resistant and therapy-sensitive tumors are likely to manifest different profiles of the promoter hypermethylation of PcG target genes and these differences can be utilized for development of DNA-based diagnostic, prognostic, and individualized therapy selection tests.

Post-translational modifications of the histones H3 and H2A, in particular, trimethylation of the lysine 27 residue (H3met3K27) by the Ezh2-containing PRC2 complex and ubiquitination of the histone H2A by the BMI1-containing PRC1 complex, are consistently linked to the transcriptional silencing mediated by the PcG proteins and a cross-talk between Polycomb targeting and DNA promoter hypermethylation. It was therefore tested whether therapy-resistant and therapy-sensitive tumors would manifest distinct expression profiles of the histones H3 and H2A variants. Multivariate Cox regression analysis demonstrates that activation and inhibition of expression of distinct variants of the H3 and H2A histones are associated with tumors manifesting different outcome after therapy. Strikingly, gene expression signatures capturing expression profiles of the limited number of variants of a single protein (either histone H3 or histone H2A) appear informative in distinguishing prostate and breast cancer patients with statistically distinct probabilities of therapy failure (FIG. 26). Interestingly, cumulative CTOP scores comprising a sum of the individual CTOP scores of the H3, H2A, and PcG methylation signatures demonstrate improved patients' stratification performance compared to individual signatures (FIG. 26).

“Stemness” CTOP Algorithm Identifies Therapy-Resistant Phenotypes and Predicts the Likelihood of Treatment Failure in Prostate, Breast, Ovarian, and Lung Cancer Patients.

The analysis indicates that genetic components of the PcG chromatin silencing complexes as well as genes identified as either direct or immediate down-stream targets of the Polycomb pathway in ESC manifest distinct patterns of association with therapy-resistant and therapy-sensitive phenotypes of human prostate and breast cancers. To investigate the status of the Polycomb pathway in human tumors with distinct clinical outcome after therapy, we divided PcG pathway-associated genes into several functionally and/or structurally linked groups (Tables 4-8) and interrogated each gene set for gene expression pattern association with therapy-resistant phenotypes using multivariate Cox regression analysis.

TABLE 8 Classification performance of the CTOP algorithm comprising six Polycomb pathway ESC “stemness” signatures in predicting clinical outcome of breast cancer in multiple independent cohorts of patients Affymetrix and Agilent Microarray Platform Breast cancer Log-rank test Data Sets Number of patients Detection of failures, % P values Chi square Hazard Ratio 95% CI of ratio Netherlands-286  286 (107) 94/107 (88%)  <0.0001 107.4 11.09 5.381 to 11.79 MSKCC-95  95 (33) 31/33 (94%) <0.0001 48.22 25.64 6.450 to 27.94 DUKE-169 169 (52) 47/52 (90%) <0.0001 55.42 14.01 4.775 to 14.60 Netherlands-97  97 (46) 43/46 (93%) <0.0001 75.11 27.11 8.547 to 29.95 Netherlands-295 295 (79) 65/79 (82%) <0.0001 51.20 6.242 3.279 to 8.034 Netherlands-295 295 (79) 72/79 (91%) <0.0001 80.42 14.33 5.010 to 12.34 Legend: The Affymetrix-based CTOP algorithms were developed using the Netherlaqnds-286 data set and tested using the MSKCC-95 and Duke-169 data sets. The Agilent-based CTOP algorithms were developed using the Netherlads-97 data set and tested using the Netherlands-295 data set. The CTOP algorithms based on the cancer-specific death after therapy were developed using the Netherlands-295 data set (last row). In the Duke-169, MSKCC-95, and Netherlands-295 data sets the end-points are the overall survival and cancer-specific death. In the Netherlands-286 data set the end-points are the relapse-free survival. In the Netherlands-97 data set the end-points are metastasis-free survival. This approach generates multiple gene expression signatures that are highly informative in stratification of cancer patients into sub-groups with statistically distinct likelihood of therapy failure (FIGS. 22-26). However, all of the signatures appear informative as therapy outcome predictors only for a fraction of patients and none of the signatures seems sufficiently accurate and robust to serve as a prototype for diagnostic, prognostic, or therapy-selection applications. Therefore, whether CTOP algorithm combining the prognostic power of individual gene expression signatures would be more informative as a molecular predictor cancer treatment outcome (FIGS. 25 and 26). For each patient a cumulative CTOP score was calculated comprising a sum of nine individual CTOP scores derived from analysis of nine gene expression signatures (Tables 4-7). Next, the patients were ranked within data set in descending order based on the values of the cumulative CTOP scores, divided each data set into five sub-groups at 20% increment of the cumulative CTOP score values, and carried out the Kaplan-Meier survival analysis (FIG. 27). This approach generates highly informative CTOP algorithm stratifying cancer patients into five sub-groups with statistically distinct probabilities of therapy failure (FIG. 27). One of the striking features revealed by our analysis is the apparent applicability of this approach for development of gene expression-based CTOP algorithms for lung and ovarian cancer patients as well (FIG. 27).

TABLE 9 Classification performance of the CTOP algorithm comprising nine “stemness” signatures in predicting clinical outcome in prostate, breast, lung, and ovarian cancer patients Affymetrix and Agilent Microarray Breast cancer Log-rank Platform Number of Detection of test Chi Hazard 95% CI of Data Sets patients failures, % P values square Ratio ratio Breast Cancer 286 (107) 97/107 (91%) <0.0001 124.3 15.25 6.351 to 13.98 Prostate Cancer 79 (37)  37/37 (100%) <0.0001 83.12 Und Und Lung Cancer 91 (45)  41/45 (91%) <0.0001 84.64 22.92 11.69 to 44.23 Ovarian Cancer 133 (72)   56/72 (78%) <0.0001 78.47 7.592 6.272 to 17.81 Legend: The Affymetrix-based CTOP algorithms were developed separately for breast cancer and prostate cancer data sets. CTOP algorithm identified using breast cancer data set was applied to the lung cancer data set and ovarian cancer data set. In the ovarian cancer and lung cancer data sets the end-points are the overall survival and cancer-specific death. In the breast cancer data set the end-points are the disease-free survival. In the prostate cancer data set the end point is the relapse-free survival. In all data sets poor prognosis groups include patients with top 50% values of the cumulative CTOP scores in a given data set. Und, undefined due to the 100% cure rate in the good prognosis group. See text for details.

TABLE 10 Classification performance of the CTOP algorithm comprising nine “stemness” signatures in predicting clinical outcome in prostate, breast, lung, and ovarian cancer patients Affymetrix and Agilent Microarray Breast cancer Log-rank Platform Number of Detection of test Chi Hazard 95% CI of Data Sets patients failures, % P values square Ratio ratio Breast Cancer 286 (107) 104/107 (97%) <0.0001 96.59 34.31 4.663 to 10.04 Prostate Cancer 79 (37)  37/37 (100%) <0.0001 43.72 Und Und Lung Cancer 91 (45)  44/45 (98%) <0.0001 65.05 62.87 6.910 to 23.90 Ovarian Cancer 133 (72)   71/72 (99%) <0.0001 28.19 29.19 2.436 to 6.904 Legend: The Affymetrix-based CTOP algorithms were developed separately for breast cancer and prostate cancer data sets. CTOP algorithm identified using breast cancer data set was applied to the lung cancer data set and ovarian cancer data set. In the ovarian cancer and lung cancer data sets the end-points are the overall survival and cancer-specific death. In the breast cancer data set the end-points are the disease-free survival. In the prostate cancer data set the end point is the relapse-free survival. In all data sets, except ovarian cancer, poor prognosis groups include patients with top 60% values of the cumulative CTOP scores in a given data set. In ovarian cancer data set the poor prognosis group includes patients with top 80% cumulative CTOP score values. Und, undefined due to the 100% cure rate in the good prognosis group. See text for details.

TABLE 11 Classification performance of the CTOP algorithm comprising nine “stemness” signatures in predicting clinical outcome in prostate, breast, lung, and ovarian cancer patients Affymetrix and Agilent Microarray Breast cancer Log-rank Platform Number of Detection of test Chi Hazard 95% CI of Data Sets patients failures, % P values square Ratio ratio Breast Cancer 286 (107) 104/107 (97%) <0.0001 96.59 34.31 4.663 to 10.04 Prostate Cancer 79 (37)  37/37 (100%) <0.0001 43.72 Und Und Lung Cancer 91 (45)  44/45 (98%) <0.0001 65.05 62.87 6.910 to 23.90 Ovarian Cancer 133 (72)   62/72 (86%) <0.0001 57.15 7.890 4.040 to 10.74 Legend: The Affymetrix-based CTOP algorithms were developed separately for breast cancer and prostate cancer data sets. CTOP algorithm identified using breast cancer data set was applied to the lung cancer data set and ovarian cancer data set. In the ovarian cancer and lung cancer data sets the end-points are the overall survival and cancer-specific death. In the breast cancer data set the end-points are the disease-free survival. In the prostate cancer data set the end point is the relapse-free survival. In all data sets poor prognosis groups include patients with top 60% values of the cumulative CTOP scores in a given data set. Und, undefined due to the 100% cure rate in the good prognosis group. See text for details. Validation of the PcG proteins Chromatin Silencing Pathway Involvement in Development of Therapy-Resistant Prostate Cancer.

The association of the PcG protein chromatin silencing pathway activation with therapy-resistant cancer using alternative analytical approaches were investigated. Consistent with this idea, a quantitative immunofluorescent co-localization analysis demonstrates that a cancer stem cell-like CD44+/CD34− population isolated by sterile FACS sorting from the blood-borne PC3-32 human prostate carcinoma metastasis precursor cells is markedly enriched for dual-positive BMI1/Ezh2 high expressing cancer cells compared to the CD44+/CD24− population isolated from the maintained in culture parental PC3 cell line. Furthermore, a multi-color FISH analysis reveals that blood-borne human prostate carcinoma metastasis precursor cell population contains a large proportion of cancer cells with the high level co-amplification of both BMI1 and Ezh2 genes (Table 12), suggesting that increased co-expression in these cells of the BMI1 and Ezh2 oncoproteins is driven by the co-amplification of two oncogenes, BMI1 and Ezh2.

TABLE 12 FISH analysis of DNA copy numbers of the Polycomb Group BMI1 and Ezh2 genes in human prostate carcinoma cell lines (parental PC-3 cells and blood-borne PC-3-32 metastasis precursor cells) and diploid hTERT-immortalized human fibroblasts. Dual-positive, Dual-positive, N BMI1-Cy3 Ezh2-Cy5 N (%) N BMI1-Cy5 Ezh2-Cy3 N (%) BJ-1 52 Average 2.333333 2  0 45 2.386364 2.533333  0 STDEV 0.905388 0.709768 1.125103 1.013545 PC-3 74 Average 2.125 4.125  1 (1.4%) 59 2.192308 4.482143  2 (3%) STDEV 1.090475 1.470492 1.00738 2.071031 T-test* 0.941271 5.13E−13 0.393451 1.38E−08 PC-3-32 99 Average 3.597561 5.185185 33 (33%) 102 3.540816 5.490196 34 (33%) STDEV 1.638481 1.743298 1.486492 1.733451 T-test** 8.43E−09 7.24E−31 1.49E−06 2.55E−25 T-test*** 7.49E−09 3.19E−08 6.38E−10 0.002259 Dual-positive, N (%); nuclei with 5 or more copies of the Ezh2 gene and 4 or more copies of the BMI1 gene T-test*, BJ-1 vs PC-3 T-test**, BJ-1 vs PC-3-32 T-test***, PC-3 vs PC-3-32

Finally, a multi-color quantitative immunofluorescent co-localization TMA analysis of 71 prostate carcinomas indicates that patients with tumors having increased levels (>1%) of dual-positive BMI1/Ezh2 high expressing cells manifest clinically aggressive disease phenotypes and significantly more likely to relapse and develop disease recurrence after radical prostatectomy. Taken together with the previously reported experimental evidence of the essential role of PcG pathway activation in metastatic prostate cancer, these data strongly support the hypothesis of the causal association of the Polycomb pathway activation and manifestation of the clinically lethal therapy-resistant prostate cancer phenotypes.

The analysis generated a “stemness” cancer therapy outcome predictor (CTOP) algorithm comprising a combination of nine signatures [signatures of BMI1-, Nanog/Sox2/Oct4-, EED-, and Suz12-pathways; transposon exclusion zones (TEZ) and ESC pattern 3 signatures; signatures of polycomb-bound transcription factors (PcG-TF) and bivalent chromatin domain transcription factors (BCD-TF)]. A “stemness” CTOP algorithm demonstrates nearly 100% prognostic accuracy for a majority of patients in retrospective analysis of large cohorts of breast, prostate, lung, and ovarian cancer patients, suggesting that therapy-resistant and therapy-sensitive tumors develop within genetically distinct “stemness”/differentiation programs driven by engagement of the PcG proteins chromatin silencing pathway. The signatures of the PcG pathway appear highly informative in stratification of the early-stage breast, lung, and prostate cancer patients into sub-groups with dramatically distinct likelihood of therapy failure. The findings and conclusions were validated by applying alternatives analytical techniques and methodologies of the PcG pathway analysis in cell culture experiments, animal models of cancer metastasis, and clinical tumor samples, including a variety of protein expression assays using combinations of immunofluorescence, FACS, and tissue microarray techniques. Taking together, the analysis indicates that epigenetic landscape of therapy-resistant human cancers is defined to a significant extent by the activation of the PcG protein chromatin silencing pathway and heritable imprinting of a stem cell-like epigenetic program via cross-talk between PcG pathway and DNA promoter hypermethylation.

Clinical genomics data suggest that gene expression signatures associated with the “stemness” state of a cell might be informative as molecular predictors of cancer therapy outcome. This hypothesis was tested by applying the signature discovery principles to genomic analysis of human and mouse ESC during transition from self-renewing, pluripotent state to differentiated phenotypes in several experimental models of ESC differentiation. Collectively, the data suggest that therapy-resistant and therapy-sensitive tumors develop within genetically distinct “stemness”/differentiation programs. To date, the retrospective analysis of the prognostic power of individual “stemness” signatures is being extended to more than 3,100 patients diagnosed with 13 distinct types of cancer supporting the conclusion that therapy-resistant and therapy-responsive cancer phenotypes manifest distinct patterns of association with “stemness”/differentiation pathways.

Taken together, the analysis further supports the existence of transcriptionally discernable type of human cancer detectable in a sub-group of early-stage cancer patients diagnosed with distinct epithelial malignancies appearing in multiple organs. These early-stage carcinomas of seemingly various origins appear to exhibit a poor therapy outcome gene expression profile, which is uniformly associated with increased propensity to develop metastasis, high likelihood of treatment failure, and increased probability of death from cancer after therapy. Cancer patients who fit this transcriptional profile might represent a genetically, biologically, and clinically distinct type of cancer exhibiting highly malignant clinical behavior and therapy resistance phenotype even at the early stage of tumor progression. It has been suggested that one of the characteristic features of this early-stage, therapy-resistant metastatic cancer is the transcriptional (and, perhaps, biological) resemblance to the normal stem cells. A stem cell cancer hypothesis has been proposed to explain a possible mechanistic contribution of the normal stem cells to the pathogenesis of this type of human cancer. According to this hypothesis, a genetically defined sub-set of transformed cells (perhaps, arising with higher probability in a genetically defined human sub-population) form tumors with high tropism toward normal stem cells (NSCs) mediated by molecules collectively defined as “presence of wound” and/or “hypoxia” signals. Enrichment of primary tumors with NSCs increases likelihood of horizontal genomic transfer (large-scale transfer of DNA and chromatin) between NSCs and tumor cells via cell fusion and/or uptake of apoptotic bodies. Reprogrammed somatic hybrids of tumor cells and NSCs acquire transformed phenotype and epigenetic self-renewal program. Postulated progeny of hybrid cells contains a sub-population of self-renewing cancer stem cells with epigenetic and transcriptional markers of NSCs and high propensity toward metastatic dissemination. Recent experimental observations demonstrate direct involvement of the bone marrow-derived cells in development of breast and colon cancers in transgenic mouse cancer models suggesting that cancer stem cells can originate from the bone marrow-derived cells.

The analysis highlights the significant challenges associated with a prospect of practical implementation of the concept of personalized medicine in clinical oncology settings. Many of these challenges are based on a fundamental reality of a biological context defined by the multigenic nature of human cancers and its implications for diagnostic, prognostic (inter-patients and intra-tumor heterogeneities; requirements for multi-signatures diagnostic, prognostic, and therapy selection algorithms), and therapeutic applications (the eventual necessity for highly individualized combinations of cancer therapeutics for simultaneous targeting of relevant oncogenic and stemness pathways to alleviate the probability of selection of therapy-resistant phenotypes). One of such non-anticipated near-term health care management and regulatory implications for successful clinical implementation of the concept of personalized cancer therapies revealed by the analysis is the unrestricted physicians' ability to prescribe and exercise in a routine clinical setting an off-label use of the FDA approved drugs.

One of the important end-points of our work is development of a concise catalog of gene expression changes comprising ˜300 human genes divided into nine signatures and reflecting a transcriptional pathology of “stemness’/differentiation pathways associated with therapy-resistant phenotypes of human solid tumors. One of the significant advantages of having such a “stemness” catalog available is the potential to exploit this information for a therapeutic gain in the effort to target clinically lethal states of malignant phenotypes. Therefore, evaluating a potential therapeutic utility of the association of “stemness” and therapy-resistant cancer phenotypes was attempted by exploring the connectivity map (CMAP) of “stemness” pathways in human solid tumors with distinct clinical outcome after therapy. CMAP-based search for cancer therapeutics targeting “stemness” pathways in solid tumors reveals drug combinations causing transcriptional reversal of “stemness” signatures associated with therapy-resistant phenotypes of epithelial cancers. CMAP analysis demonstrates that a combination of the PI3K pathway inhibitor, estrogen receptor (ER) antagonist, and mTOR inhibitor causes transcriptional reversal of “stemness” signatures in 35 of 37 (95%) patients diagnosed with therapy-resistant prostate cancer. CMAP-based design of target-tailored individualized breast cancer therapies reveals drug combinations causing transcriptional reversal of “stemness’ signatures in 91 of 107 (85%) of the early-stage breast cancer patients with therapy-resistant disease phenotypes. A combination of PI3K pathway inhibitor, ER antagonist, and HDAC inhibitor causes transcriptional reversal of “stemness” pathways in 53 of 107 (49.5%) patients diagnosed with the early-stage therapy-resistant breast cancer. Similarly, CMAP-based analysis of target-tailored individualized therapies for lung cancer reveals drug combinations causing transcriptional reversal of “stemness’ signatures in 39 of 45 (87%) of the early-stage lung cancer patients with therapy-resistant tumor phenotypes. Outlined in this work the connectivity map-based approach to discovery of small molecule drugs targeting clinical phenotype-associated gene expression signatures may be useful for multiple therapeutic applications beyond therapy-resistant human malignancies.

The analysis seems to indicate that several individual drugs and/or their analogs which are already either FDA approved for clinical use or in the late-stage clinical trials may have a promising therapeutic potential against therapy-resistant clinically lethal forms of human cancers. Therefore, the findings may have a significant near-term impact on design and conduct of clinical trials for evaluation of the efficacy of novel personalized target-tailored combinations of cancer therapeutics designed to target therapy-resistant phenotypes of human solid tumors by applying the evidence-based rational selection principles during the design stage of drug combinations. These findings will likely have a near-term impact on protocols of design and execution of the clinical trials for novel cancer therapeutics, including the regulatory guidelines for patients' eligibility requirements at the enrollment stage. It should allow the execution of such protocols in most cost-efficient way and with the maximum potential benefits for patients by facilitating the selection for a trial the populations at the high-risk of failure of existing therapy. Another conclusion from our analysis with major health care management and regulatory implications is that a near-term progress in practical implementation of the concept of personalized cancer therapies would depend on physicians' ability to select, prescribe, and exercise in a routine clinical setting an off-label use of the FDA approved drugs. In this context the issues of timely delivery to the practicing physicians of relevant scientific information and the dynamic evolution of the supporting regulatory environment adherent to the state of the art scientific evidence would be of paramount importance.

The following examples are intended to further illustrate certain embodiments of the invention and are not intended to limit the scope of the invention.

Example 1 Preparation of Clinical Samples

Two clinical outcome sets comprising 21 (outcome set 1) and 79 (outcome set 2) samples were utilized for analysis of the association of the therapy outcome with expression levels of the BMI1 and Ezh2 genes and other clinico-pathological parameters. Expression profiling data of primary tumor samples obtained from 1243 microarray analyses of eight independent therapy outcome cohorts of cancer patients diagnosed with four types of human cancer were analyzed in this study. Microarray analysis and associated clinical information for clinical samples analyzed in this work were previously published and are publicly available.

Prostate tumor tissues comprising clinical outcome data set were obtained from 79 prostate cancer patients undergoing therapeutic or diagnostic procedures performed as part of routine clinical management at the Memorial Sloan-Kettering Cancer Center (MSKCC). Clinical and pathological features of 79 prostate cancer cases comprising validation outcome set are presented elsewhere. Median follow-up after therapy in this cohort of patients was 70 months. Samples were snap-frozen in liquid nitrogen and stored at −80° C. Each sample was examined histologically using H&E-stained cryostat sections. Care was taken to remove nonneoplastic tissues from tumor samples. Cells of interest were manually dissected from the frozen block, trimming away other tissues. Overall, 146 human prostate tissue samples were analyzed in this study, including forty-six samples in a tissue microarray (TMA) format. TMA samples analyzed in this study were exempt according to the NIH guidelines.

In addition, we carried out the analysis of gene expression profiling data from 942 microarray experiments derived from five different breast cancer therapy outcome data sets. Expression profiling data for tumor samples obtained from 91 lung adenocarcinoma patients, 169 breast cancer patients, and 133 ovarian cancer patients were analyzed in this study. The original microarray analyses as well as associated clinical information for these samples were reported elsewhere. Primary gene expression data files of clinical samples as well as associated clinical information can be found in corresponding papers. To date the cancer therapy outcome database includes 3,176 therapy outcome samples from patients diagnosed with thirteen distinct types of cancers (Table 3): prostate cancer (220 patients); breast cancer (1171 patients); lung adenocarcinoma (340 patients); ovarian cancer (216 patients); gastric cancer (89 patients); bladder cancer (31 patients); follicular lymphoma (191 patients); diffuse large B-cell lymphoma (DLBCL, 298 patients); mantle cell lymphoma (MCL, 92 patients); mesothelioma (17 patients); medulloblastoma (60 patients); glioma (50 patients); acute myeloid leukemia (AML, 401 patients).

Example 2 Cell Culture

Cell lines used in this study were previously described in Glinsky et al., Cancer Lett., 201: 67-77 (2003). The LNCap- and PC-3-derived cell lines were developed by consecutive serial orthotopic implantation, either from metastases to the lymph node (for the LN series), or reimplanted from the prostate (Pro series). This procedure generated cell variants with differing tumorigenicity, frequency and latency of regional lymph node metastasis. Except where noted, cell lines were grown in RPMI1640 supplemented with 10% FBS and gentamycin (Gibco BRL) to 70-80% confluence and subjected to serum starvation as described, or maintained in fresh complete media, supplemented with 10% FBS. Growth inhibitory experiments were carried out in the 96-well format based on Hoechst staining for the estimate of live cell counts using high-through put robotics of the Target and Drug Discovery Facility (TDDF) of the Ordway Research Institute Cancer Center. Chemicals, reagents, and drugs were purchased from Sigma, except were indicated otherwise.

Example 3 Anoikis Assay

Cells were harvested by 5-min digestion with 0.25% trypsin/0.02% EDTA (Irvine Scientific, Santa Ana, Calif., USA), washed and resuspended in serum free medium. Cells at concentration 1.7×10⁵ cells/well in 1 ml of serum free medium were plated in 24-well ultra low attachment polystyrene plates (Corning Inc., Corning, N.Y., USA) and incubated at 37° C. and 5% CO₂ overnight. Viability of cell cultures subjected to anoikis assays were >95% in Trypan blue dye exclusion test.

Example 4 Apoptosis Assay

Apoptotic cells were identified and quantified using the Annexin V-FITC kit (BD Biosciences Pharmingen) per manufacturer instructions. The following controls were used to set up compensation and quadrants: 1) Unstained cells; 2) Cells stained with Annexin V-FITC (no PI); 3) Cells stained with PI (no Annexin V-FITC). Each measurements were carried out in quadruplicate and each experiments were repeated at least twice Annexin V-FITC positive cells were scored as early apoptotic cells; both Annexin V-FITC and PI positive cells were scored as late apoptotic cells; unstained Annexin V-FITC and PI negative cells were scored as viable or surviving cells. In selected experiments apoptotic cell death was documented using the TUNEL assay.

Example 5 Flow Cytometry

Cells were washed in cold PBS phosphate-buffered saline and stained according to manufacturer's instructions using the Annexin V-FITC Apoptosis Detection Kit (BD Biosciences, San Jose, Calif., USA) or appropriate antibodies for cell surface markers. Flow analysis was performed by a FACS Calibur instrument (BD Biosciences, San Jose, Calif., USA). Cell Quest Software was used for data acquisition and analysis. All measurements were performed under the same instrument setting, analyzing 10³-10⁴ cells per sample.

Example 6 Tissue Processing for mRNA and RNA Isolation

Fresh frozen orthotopic and transgenic primary tumors, metastases, and mouse prostates were examined by use of hematoxylin and eosin stained frozen sections as described previously. Orthotopic tumors of all sublines exhibited similar morphology consisting of sheets of monotonous closely packed tumor cells with little evidence of differentiation interrupted by only occasional zones of largely stromal components, vascular lakes, or lymphocytic infiltrates. Fragments of tumor judged free of these non-epithelial clusters were used for mRNA preparation. Frozen tissue (1-3 mm×1-3 mm) was submerged in liquid nitrogen in a ceramic mortar and ground to powder. The frozen tissue powder was dissolved and immediately processed for mRNA isolation using a Fast Tract kit for mRNA extraction (Invitrogen, Carlsbad, Calif., see above) according to the manufacturers instructions.

RNA and mRNA extraction: For gene expression analysis, cells were harvested in lysis buffer 2 hrs after the last media change at 70-80% confluence and total RNA or mRNA was extracted using the RNeasy (Qiagen, Chatsworth, Calif.) or FastTract kits (Invitrogen, Carlsbad, Calif.). Cell lines were not split more than 5 times prior to RNA extraction, except where noted. Detailed protocols were described elsewhere.

Affymetrix arrays: The protocol for mRNA quality control and gene expression analysis was that recommended by Affymetrix. In brief, approximately one microgram of mRNA was reverse transcribed with an oligo(dT) primer that has a T7 RNA polymerase promoter at the 5′ end. Second strand synthesis was followed by cRNA production incorporating a biotinylated base. Hybridization to Affymetrix U95Av2 arrays representing 12,625 transcripts overnight for 16 h was followed by washing and labeling using a fluorescently labeled antibody. The arrays were read and data processed using Affymetrix equipment and software as reported previously.

Data analysis: Detailed protocols for data analysis and documentation of the sensitivity, reproducibility and other aspects of the quantitative statistical microarray analysis using Affymetrix technology have been reported. 40-50% of the surveyed genes were called present by the Affymetrix Microarray Suite 5.0 software in these experiments. The concordance analysis of differential gene expression across the data sets was performed using Affymetrix MicroDB v. 3.0 and DMT v.3.0 software as described earlier. The microarray data was processed using the Affymetrix Microarray Suite v.5.0 software and performed statistical analysis of expression data set using the Affymetrix MicroDB and Affymetrix DMT software. The Pearson correlation coefficient for individual test samples and appropriate reference standard was determined using the Microsoft Excel and the GraphPad Prism version 4.00 software. The significance of the overlap between the lists of stem cell-associated and prostate cancer-associated genes was calculated by using the hypergeometric distribution test. The Multiple Experiments Viewer (MEV) software version 3.0.3 of the Institute for Genomic Research (TIGR) was used for clustering algorithm data analysis and visualization.

Polycomb pathway “stemness” signatures: The initial analysis was performed using two cancer therapy outcome data sets: 79-patients prostate cancer data set and 286-patients breast cancer data set. For each parent signature (Table 4), the multivariate Cox regression analysis was carried out. Consistent with the concept that therapy resistant and therapy sensitive tumors develop within distinct Polycomb-driven “stemness”/differentiation programs, all signatures generate statistically significant models of cancer therapy outcome were found. The number of predictors in each signature, we removed from further analysis all probe sets with low independent predictive values were removed from further analysis to eliminate redundancy (typically, with the p>0.1 in multivariate Cox regression analysis). These steps generate nine cancer therapy outcome signatures listed in the Table 4 all of which provide statistically significant therapy outcome models in multivariate Cox regression analysis in multiple cancer therapy outcome data sets. For each patient, the expression values of all genes comprising a signature into a single numerical value were calculated using either Pearson correlation coefficient approach or weighted coefficient method as scribed previously. These numerical values provide the cancer therapy outcome predictor (CTOP) scores for each signature for every individual patient. The log 10 transformed fold change expression values or individual weighted coefficients obtained from the multivariate Cox regression analysis were used as multidimensional numerical vectors in Pearson and weighted methods, respectively. The Kaplan-Meier survival analysis was performed to assess the patients' stratification performance of each signature. Patients were sorted in descending order based on the numerical values of the CTOP scores and survival curves were generated by designating the patients with top 50% scores and bottom 50% scores into poor prognosis and good prognosis groups, respectively. These analytical protocols were independently carried out for 79-pateints prostate cancer data set and 286-patients breast cancer data set. Gene expression signatures generated using 286-patients breast cancer data set were utilized in subsequent analyses of four additional independent breast cancer data sets as well as lung cancer and ovarian cancer data sets (Table 3).

Example 7 Random Co-Occurrence Test

10,000 permutations test were performed to check how likely small gene signatures derived from the large signature would display high discrimination power to assess the significance at the 0.1% level as described earlier. It was found that 10,000 permutations generated 7 random 11-gene signatures performing at sample classification level of the 11-gene MTTS/PNS signature.

Example 8 Weighted Survival Predictor Score Algorithm

The weighted survival score analysis was implemented to reflect the incremental statistical power of the individual covariates as predictors of therapy outcome based on a multi-component prognostic model. The microarray-based or Q-RT-PCR-derived gene expression values were normalized and log-transformed on a base 10 scale. The log-transformed normalized expression values for each data set were analyzed in a multivariate Cox proportional hazards regression model, with overall survival or event-free survival as the dependent variable. To calculate the survival/prognosis predictor score for each patient, the log-transformed normalized gene expression value measured for each gene by a coefficient derived from the multivariate Cox proportional hazard regression analysis was multiplied. Final survival predictor score comprises a sum of scores for individual genes and reflects the relative contribution of each of the eleven genes in the multivariate analysis. The negative weighting values indicate that higher expression correlates with longer survival and favorable prognosis, whereas the positive score values indicate that higher expression correlates with poor outcome and shorter survival. Thus, the weighted survival predictor model is based on a cumulative score of the weighted expression values of eleven genes. For example, the following equation is describing the relapse-free survival predictor score for prostate cancer patients (Table 4): CTOP score=(−0.403×Gbx2)+(1.2494×KI67)+(−0.3105×Cyclin B1)+(−0.1226×BUB1)+(0.0077×HEC)+(0.0369×KIAA1063)+(−1.7493×HCFC1)+(−1.1853×RNF2)+(1.5242×ANK3)+(−0.5628×FGFR2)+(−0.4333×CES1).

Example 9 Immunofluorescence Microscopy

Cells fixed with 3.7% paraformaldehyde in phosphate-buffered saline (PFA/PBS) for 15 minutes were permeabilized with 0.5% Triton-X100 (Sigma, St. Louis, Mo., USA)/PBS for 5 min. After washing in PBS, cells were incubated in PBS containing 100 mM glycine for 10 min. Primary antibodies were diluted in 0.5% BSA/0.05% gelatin cold water fish skin/PBS, and cells were incubated in this buffer for 10 min before antibodies were applied for 16 hrs at room temperature. After washing in PBS buffer, cells were incubated with secondary antibodies at 1:500 dilution. Coverslips were mounted in Prolong (Molecular Probes, Inc.). Images were collected on an inverted microscope (OlympusIX70) equipped with a DeltaVision imaging system using a ×40 objective. Images were processed by softWoRx v.2.5 software (Applied Precision Inc., Issaquah, Wash.) and images were quantified with using ImageJ 1.29× software.

Quantitative immunofluorescence analysis of the PcG protein expression was performed using human prostate cancer tissue microarrays (TMAs) representing 46 prostate tissue samples (thirty-nine cases of prostate cancer and seven cases of normal prostate). Analysis was carried-out on the prostate cancer TMAs from Chemicon (Temecula, Calif.; TMA # 3202-4; four cancer cases and two cases of normal tissue; and TMA # 1202-4; twenty five cases of cancer and five cases of normal tissue) and TMA of 10 cases of prostate cancer from the SKCC tumor bank (San Diego, Calif.). TMAs contain two 2.0 mm cores of each case and haematoxylin-and-eosin (H&E) sections which were used for visual selection of the pathological tissues, histological diagnosis, and grading by the pathologists of TMA providers.

Four- or five-micrometer paraffin-embedded sections were baked at 56° C. for 1 hour, allowed to cool for about 5 minutes, dewaxed in xylene, and rehydrated in a series of graded alcohols. Antigen retrieval was achieved by boiling slides in 10 mM sodium citrate buffer, 0.05% Tween 20, pH 6.0 in a water bath for 30 minutes. The sections were washed with PBS, incubated in 100 mM glycine/PBS for 10 minutes, blocked in 0.5% BSA/0.05% gelatine cold water fish skin/PBS and incubated with primary antibody overnight.

Primary antibodies were EZH2 rabbit polyclonal antibody (1:50), BMI1 mouse monoclonal IgG1 antibody (1:50), ubiH2A mouse IgM (1:100), 3metK27 rabbit polyclonal antibody (1:100) (Upstate, Lake Placid, N.Y.). Suz12 rabbit (1:50), AMACR rabbit (1:50) antibodies and Dicer mouse IgG1 (1:20) were purchased from Abcam (Cambridge, Mass.). BMI1 rabbit (1:50) and TRAP100 (1:50) goat antibodies were from Santa Cruz Biotechnology (Santa Cruz, Calif.). Cyclin D1 rabbit polyclonal antibody (1:50) were from Biocare Medical (Concord, Calif.). EZH2 mouse monoclonal antibodies were kindly provided by Dr. A. P. Otte.

The primary antibodies were rinsed off with PBS and slides were incubated with secondary antibodies at 1:300 dilutions for 1 hour at room temperature. Secondary antibodies (chicken antirabbit Alexa 594, goat antimouse Alexa 488, goat antimouse IgG1 Alexa 350, and donkey antigoat Alexa 488 conjugates) were from Molecular Probes (Eugene, Oreg.). The slides were washed four times in PBS for five minutes each wash, rinsed in distilled water and the specimen were coverslipped with Prolong Gold Antifade Reagent (Molecular Probes, Eugene, Oreg.) containing DAPI. For negative controls, the primary antibodies were omitted. Three samples were excluded from analysis because one of the following reasons: core loss, unrepresentative sample, or sub-optimal DNA and antigen preservation.

Images were collected on an inverted fluorescent microscope (LEICA DMIRE 2 or Olympus IX70) using an ×40 objective. Images were processed by Leica FW4000 software and images were quantified with using ImageJ 1.29× software (http://rsb.info.nih.gov/ij). Expression values were measured in at least 200 nuclei from two microscopic fields for each case.

The measurements were carried out in the nuclei of individual cells defined by DAPI staining both in experimental and clinical samples. For experimental samples, the comparison thresholds for each marker combination were defined at the 90-95% exclusion levels for dual positive cells in corresponding control samples (parental low metastatic cells). For clinical samples, the comparison thresholds for each marker combination were defined at the 99% or greater exclusion levels for dual positive cells in corresponding control samples (normal epithelial cells in TMA experiments). All individual immunofluorescent assay experiments (defined as the experiments in which the corresponding comparisons were made) were carried out simultaneously using the same reagents and included all experimental samples and controls utilized for a quantitative analysis. Statistical significance of the measurements was ascertained and consistency of the findings was confirmed in multiple independent experiments, including several independent sources of the prostate cancer TMA samples.

Example 10 Orthotopic Xenografts

Orthotopic xenografts of human prostate PC-3 cells and prostate cancer metastasis precursor sublines used in this study were developed by surgical orthotopic implantation as previously described in Glinsky et al (2003), supra. Briefly, 2×10⁶ cultured PC-3 cells or sublines were injected subcutaneously into male athymic mice, and allowed to develop into firm palpable and visible tumors over the course of 2-4 weeks. Intact tissue was harvested from a single subcutaneous tumor and surgically implanted in the ventral lateral lobes of the prostate gland in a series of ten athymic mice per cell line subtype as described in Glinsky et al (2003), supra. During orthotopic cell inoculation experiments, a single-cell suspension of 1.5×10⁶ cells was injected into mouse prostate gland in a series of ten athymic mice per therapy group.

Example 11 Fluorescence In Situ Hybridization (FISH)

PC3 human prostate adenocarcinoma cell line, derived subline PC3-32 and diploid human fibroblast BJ1-hTERT cells were used for the assessment of gene amplification status. The cyanine-3 or cyanine-5 labeled BAC clone RP11-28C14 was used for the EZH2 locus (7q35-q36), the BAC clone RP11-232K21 was used for the BMI1 locus (10p11.23), the BAC clone RP11-440N18 was used for the Myc locus (8q24.12-q24.13), the BAC clone RP-11-1112H21 was used for the LPL locus (8p22). FISH analysis was done accordingly protocol as described previously.

Methanol/glacial acetic acid cell fixation: Cell cultures were synchronized with 4 ug/ml aphidicolin (Sigma Chemical Co.) for 17 hour at 37° C. Synchronized cells were subjected to hypotonic treatment in 0.56% KCl for 20 minute at 37° C., followed by fixation in Carnoy's fixative (3:1 methanol:glacial acetic acid). Cell suspension was dropped onto glass slides, air dried. The slides are treated for 30 minutes with 0.005% pepsin in 0.01N HCl at room temperature and then are dehydrated through a series washes in 70%, 85%, and 100% ethanol. Denaturation of DNA is performed by plunging the slide in a coplin jar containing 70% formamide/2×SSC (pH 7.0) for 30 min at 75° C. The slide immediately are plunged into ice-cold 2×SSC and then dehydrated as earlier.

Fluorescence in situ hybridization (FISH): All BAC clones were obtained from the Rosewell Park Cancer Institute (RPCI, Buffalo, N.Y.). The BAC DNA was labeled with Cy3-dCTP or Cy5-dCTP (Perkin Elmer Life Sciences, Inc.) using BioPrime DNA Labeling System (Invitrogen). The resultant probes are purified with QIAquick PCR Purification Kit (Qiagen). DNA recovery and the amount of incorporated Cy3 or Cy5 are verified by Nanodrop spectrophotometry.

Prior to hybridization the probe is precipitated with 20 ug competitor human Cot-1 DNA (per 18×18 mm coverslip) and washed in 70% ethanol. The dried pellet is thoroughly resuspended in 10 ul hybridization buffer (2×SSC, 20% dextran sulfate, 1 mg/ml BSA; NEB Inc.). The denaturated probe solution is deposited onto cells on slide. Hybridization was carried out overnight at 42° C. in a dark humidified chamber. After three washes in 50% formamide/2×SSC (adjusted to pH 7.0) and three washes in 2×SSC at 42° C., slides were counterstained and mounted in Prolong Gold Antifade Reagent with 4′,6-diamino-2-phenylindole (Invitrogen). Slides were examined using a Leica DMIRE2 fluorescence microscope (Leica, Deerfield, Ill.). Gene amplification status was determined by scoring 60-100 nuclei.

Example 12 siRNA Experiments

The target siRNA SMART pools and chemically modified degradation-resistant variants of the siRNAs (stable siRNAs) for BMI1, Ezh2, and control luciferase siRNAs were purchased from Dharmacon Research, Inc. siRNAs were transfected into human prostate carcinoma cells according to the manufacturer's protocols. Cell cultures were continuously monitored for growth and viability and assayed for mRNA expression levels of BMI1, Ezh2, and selected set of genes using RT-PCR and Q-RT-PCR methods. Eight individual siRNA sequences comprising the SMART pools (four sequences for each gene, BMI1 and Ezh2) were tested and a single most effective siRNA sequence was selected for synthesis in the chemically modified stable siRNA form for each gene. The siRNA treatment protocol [two consecutive treatments of cells in adherent cultures with 100 nM (final concentration) of Dharmacon degradation-resistant siRNAs at day 1 and 4 after plating], as designed, caused only moderate reduction in the average BMI1 and Ezh2 protein expression levels (20-50% maximal effect) and having no or only marginal effect on cell proliferation in the adherent cultures (at most ˜25% reduction in cell proliferation).

Example 13 Quantitative RT-PCR Analysis

The real time PCR methods measures the accumulation of PCR products by a fluorescence detector system and allows for quantification of the amount of amplified PCR products in the log phase of the reaction. Total RNA was extracted using RNeasy mini-kit (Qiagen, Valencia, Calif., USA) following the manufacturer's instructions. A measure of 1 μg (tumor samples), or 2 μg and 4 μg (independent preparations of reference cDNA and DNA samples from cell culture experiments) of total RNA was used then as a template for cDNA synthesis with SuperScript II (Invitrogen, Carlsbad, Calif., USA). cDNA synthesis step was omitted in the DNA copy number analysis (32). Q-PCR primer sequences were selected for each cDNA and DNA with the aid of Primer Express™ software (Applied Biosystems, Foster City, Calif., USA). PCR amplification was performed with the gene-specific primers.

Q-PCR reactions and measurements were performed with the SYBR-Green and ROX as a passive reference, using the ABI 7900 HT Sequence Detection System (Applied Biosystems, Foster City, Calif., USA). Conditions for the PCR were as follows: one cycle of 10 min at 95° C.; 40 cycles of 0.20 min at 94° C.; 0.20 min at 60° C. and 0.30 min at 72° C. The results were normalized to the relative amount of expression of an endogenous control gene GAPDH.

Expression of messenger RNA (mRNA) and DNA copy number for target genes and an endogenous control gene (GAPDH) were measured by real-time PCR method on an ABI PRISM 7900 HT Sequence Detection System (Applied Biosystems). For each gene at least two sets of primers were tested and the set-up with highest amplification efficiency was selected for the assay used in this study. Specificity of the assay for mRNA measurements was confirmed by the absence of the expected PCR products when genomic DNA was used as a template. Glyceraldehyde-3-phosphate dehydrogenase (GAPDH: 5′-CCCTCAACGACCACTTTGTCA-3′ [SEQ ID NO.:1] and 5′-TTCCTCTTGTGCTCTTGCTGG-3′ [SEQ ID NO.:2]) was used as the endogenous RNA and cDNA quantity normalization control. For calibration and generation of standard curves, several reference cDNAs were prepared: cDNA prepared from primary in vitro cultures of normal human prostate epithelial cells, cDNA derived from the PC-3M human prostate carcinoma cell line, and cDNA prepared from normal human prostate. For DNA copy number analysis, human placental DNA was used as a normalization control. Expression and DNA copy number analysis of all genes was assessed at least in two independent experiments using reference cDNAs to control for variations among different Q-RT-PCR experiments. Prior to statistical analysis, the normalized gene expression values were log-transformed (on a base 10 scale) similarly to the transformation of the array-based gene expression data.

Example 14 Survival Analysis

The Kaplan-Meier survival analysis was carried out using the GraphPad Prism version 4.00 software (GraphPad Software, San Diego, Calif.). The end point for survival analysis in prostate cancer was the biochemical recurrence defined by the serum PSA increase after therapy. Disease-free interval (DFI) was defined as the time period between the date of radical prostatectomy (RP) and the date of PSA relapse (recurrence group) or date of last follow-up (non-recurrence group). Statistical significance of the difference between the survival curves for different groups of patients was assessed using Chi square and Log-rank tests. To evaluate the incremental statistical power of the individual covariates as predictors of therapy outcome and unfavorable prognosis, both univariate and multivariate Cox proportional hazard survival analyses were performed. Clinico-pathological covariates included in this analysis were preoperative PSA, Gleason score, surgical margins, extra-capsular invasion, seminal vesicle invasion, and age. 

1. A method for diagnosing a more aggressive form of prostate, breast, lung, or ovarian cancer in a subject comprising determining a cumulative cancer therapy outcome predictor (CTOP) score for six gene expression signatures, wherein the six gene expression signatures are TEZ, EED, SUZ12/POLII, SUZ12, NANOG/SOX2/OCT4, and PCG-TF, wherein the CTOP score for each signature is determined by measuring the mRNA expression of each gene comprising the signature in a sample from the subject, and wherein a higher cumulative CTOP score indicates a more aggressive form of the cancer.
 2. The method of claim 1, further comprising determining the CTOP score for one or more additional gene expression signatures selected from the group consisting of BCD TF, ESC3, BMI-1 pathway, PcG methylation, and histones H3/H2A.
 3. The method of claim 1, wherein the cancer is breast cancer or lung cancer and the one or more CTOP signatures comprises a 36-transcript TEZ CTOP signature comprising transcripts of the following genes: GNG10, SOX9, BCI2, PURA, LIF, MN1, HOXD, BAI3, GFRA2, SIX1, CACNAL, SIX2, SMARCA2, and SMARCA2.
 4. The method of claim 1, wherein the cancer is breast cancer or lung cancer and the one or more CTOP signatures comprises a 20-transcript EED CTOP signature consisting of transcripts of the following genes: CCND2, YYL, SOX9, PDGFRA, EZH2, KIT, PAX6, DICERL, LHX2, BMP6, HOXC6, PAX9, SOX5, WNT8B, ESRRB, FOXIL, EED, YYL, CKAP2, and CBX8.
 5. The method of claim 1, wherein the cancer is prostate cancer and the one or more CTOP signatures comprises a 36-transcript EED CTOP signature consisting of transcripts of the following genes: PAX8, YYL, PCNA, IGF2, HPRTL, SOX9, PDGFRA, EZH2, FGFR3, HOXB7, BMPRLA, TCF21, TALI, CRABP1, HOXB2, GATA4, FGF7, BMP6, HOXA4, CBX4, HOXC5, HOXC6, DIX2, FOXHI, SOX21, NEUROG1, EED, GATA3, GATAL, MYST4, YYL, HOXA2, PAX8, PDGFRA, TALL, and DKK2.
 6. The method of claim 1, wherein the cancer is breast cancer or lung cancer and the one or more CTOP signatures comprises a 20-transcript SUZ12/POLII CTOP signature consisting of transcripts of the following genes: CXCR4, KIAA0368, PLD1, NID2, ATF3, KIAA0427, DUSP4, SMTN, GADD45B, PRKCH, JUN, PCDH17, TIMP3, PLEC1, LIF, SOX9 and CDH11.
 7. The method of claim 1, wherein the cancer is prostate cancer and the one or more CTOP signatures comprises a 22-transcript SUZ12/POLII CTOP signature consisting of transcripts of the following genes: PAK1, ING1, POLR3K, PLD1, BAIAP2, BDNF, EGFR, FBN2, KIAA0427, CAST, GADD45B, PRKCH, CYP1B1, and JUN.
 8. The method of claim 1, wherein the cancer is breast cancer or lung cancer and the one or more CTOP signatures comprises a 25-transcript SUZ12 CTOP signature consisting of transcripts of the following genes: SFRP1, COL15A1, DFNA5, THBD, PSMB9, MYO5A, MN1, RASGRP1, SYNGR3, RELN, SALL1, MMP25, RASGRP2, GATA3, GATA6, PCDH7, HLA-G, TFAP2B, FLJ10159, DLL3, GNA14, TGFB2, WNT5B, and BACH2.
 9. The method of claim 1, wherein the cancer is breast cancer or lung cancer and the one or more CTOP signatures comprises a 41-transcript NANOG/SOX2/OCT4 CTOP signature consisting of transcripts of the following genes: RPS3A, TOP2A, BUB3, TALDO1, USP7, SNRPN, ZFP36L1, GNG10, SFRP1, LAMA4, FEZ1, BUB1B, MTM1, TIF1, NFE2L3, DTNA, CA4, TDGF1, 1F116, TLE3, SPAG9, ZIC3, FGFR2, 1F116, STATS, PPAP2A, PPAP2A, FGFR2, PHF8, LARGE, RPS18, FUS, FLJ10769, DUSP12, FLJ10652, FLJ11029, RAD54B, NANOG, FEZ1, FEZ1, and MTM1.
 10. The method of claim 1, wherein the cancer is prostate cancer and the one or more CTOP signatures comprises a 28-transcript NANOG/SOX2/OCT4 CTOP signature consisting of transcripts of the following genes: JUP, BUB3, ICMT, GNG10, SET, PPAP2A, LARGE, FUS, and NUCKS.
 11. The method of claim 1, wherein the cancer is breast cancer or lung cancer and the one or more CTOP signatures comprises a 30-transcript PcG TF CTOP signature consisting of transcripts of the following genes: ATF3, SOX9, ZFHX1B, JUN, CEBPA, NPAS2, NPAS2, DACH1, SIX1, EGR3, FOXF2, CART1, HOXC6, PAX9, DIX2, SIX6, ALX3, HOXB3, POU4F3, POU3F3, NR2F2, MAFA, PAX1, GSH1, WT1, SOX18, LHX6, TBR1, and EN1.
 12. The method of claim 1, wherein the cancer is prostate cancer and the one or more CTOP signatures comprises a 21-transcript PcG TF CTOP signature consisting of transcripts of the following genes: JUNB, HOXB2, NPAS2, IRF5, PAX6, SIX1, VVT1, HOXA4, HOXC6, NKX2B, HOXD13, HOXB1, NEUROG1, FOXB1, GATA6, BARX2, LEF1, NHLH2, TALI, and TBR1 BACH2.
 13. The method of claim 2, wherein the cancer is breast cancer or lung cancer and the one or more CTOP signatures comprises an 26-transcript BCD TF CTOP signature consisting of transcripts of the following genes: DLX2, EN2, HOXA1, HOXA13, HOXA2, HOXA5, HOXB2, HOXB3, HOXB6, HOXB8, HOXC13, HOXC5, HOXD3, IRX2, IRX4, LMO1, NR2F2, PAX1, PAX2, SATB1, SHH, SIX2, TLX3, ZFHX1B, HOXA7, and TITF1.
 14. The method of claim 2, wherein the cancer is prostate cancer and the one or more CTOP signatures comprises a 31-transcript BCD TF CTOP signature consisting of transcripts of the following genes: SATB1, HOXD9, PAX6, LMO4, EBF1, FOXA1, FOXD3, HOXA, HOXB, HOXC, HOXD, IRX4, MAF, MRG1, NKX2-2, and NKX6-2.
 15. The method of claim 2, wherein the one or more CTOP signatures comprises an 11-transcript BMI-1 CTOP signature consisting of transcripts of the following genes: GBX2, MK167, CYCLIN B1, BUB1, HEC, KIAAIDG3, HCFC1, RNF2, ANK3, FGFR2, and CES1. (OK—FIG. 10 of 507)
 16. The method of claim 2, wherein the cancer is breast cancer or lung cancer and the one or more CTOP signatures comprises a 35-transcript ESC3 CTOP signature consisting of transcripts of the following genes: PTPRF, MARCKS, FLNA, TNFAIPL, P1S3, IRAK1, MARCKS, PYCR1, TM4SF2, NDST1, DBN1, NSMAF, FZD7, AQP3, SSBP2, NKAT, CHC1I FST, GSTT2, CDS1, CNTFR, GGT1, FST, GGT1, EPHX2, CDKN2A, CD24A, EFS, BATT, TNNT1, IFITM3, PRDM4, SOX18, LEPREL, and MDFI.
 17. A method for prognosis of prostate, breast, lung, or ovarian cancer in a subject having prostate, breast, lung, or ovarian cancer comprising determining a cumulative cancer therapy outcome predictor (CTOP) score for six gene expression signatures, wherein the six gene expression signatures are TEZ, EED, SUZ12/POLII, SUZ12, NANOG/SOX2/OCT4, and PCG-TF, wherein the cumulative CTOP score is the sum of the CTOP scores for each signature determined by measuring the mRNA expression of each gene comprising the signature in a sample from the subject, and wherein a higher cumulative CTOP score indicates a more aggressive form of the cancer and a poor prognosis. 